Gabor Kaszab created IMPALA-13314:
-------------------------------------
Summary: Create a store for HadoopCatalogs to avoid creating a new
one for each table
Key: IMPALA-13314
URL: https://issues.apache.org/jira/browse/IMPALA-13314
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Gabor Kaszab
Currently when we create a new Iceberg table in HadoopCatalog we create a new
HadoopCatalog instance for each of these tables
[here|https://github.com/apache/impala/blob/4b500a55cbfcdd311a1c766e33849f7ae05a1a8e/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L145]
The issue with this is that a catalog object such as HadoopCatalog holds an
Iceberg FileIO instance where the size of such an instance can be measured in
MBs in terms of memory consumption. This can blow up the catalog/localCatalog
memory even if we have empty Iceberg tables in HadoopCatalog.
So as a solution we should have a kind of HadoopCatalog store, where based on a
location string we could cache HadoopCatalog objects for later use or cache a
new HadoopCatalog in the store. With this approach tables under the sane
HadoopCatalog location would be in the same HadoopCatalog instance and we won't
end up having as many FileIO instance as many tables we have in HadoopCatalog.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)