Gabor Kaszab created IMPALA-13314:
-------------------------------------

             Summary: Create a store for HadoopCatalogs to avoid creating a new 
one for each table
                 Key: IMPALA-13314
                 URL: https://issues.apache.org/jira/browse/IMPALA-13314
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Gabor Kaszab


Currently when we create a new Iceberg table in HadoopCatalog we create a new 
HadoopCatalog instance for each of these tables 
[here|https://github.com/apache/impala/blob/4b500a55cbfcdd311a1c766e33849f7ae05a1a8e/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L145]

The issue with this is that a catalog object such as HadoopCatalog holds an 
Iceberg FileIO instance where the size of such an instance can be measured in 
MBs in terms of memory consumption. This can blow up the catalog/localCatalog 
memory even if we have empty Iceberg tables in HadoopCatalog.

So as a solution we should have a kind of HadoopCatalog store, where based on a 
location string we could cache HadoopCatalog objects for later use or cache a 
new HadoopCatalog in the store. With this approach tables under the sane 
HadoopCatalog location would be in the same HadoopCatalog instance and we won't 
end up having as many FileIO instance as many tables we have in HadoopCatalog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to