Zoltán Borók-Nagy created IMPALA-11784:
------------------------------------------

             Summary: Don't unnecessarily call Iceberg's planFiles() during 
table loading
                 Key: IMPALA-11784
                 URL: https://issues.apache.org/jira/browse/IMPALA-11784
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Zoltán Borók-Nagy


Iceberg's planFiles() API is very expensive because it involves reading the 
Avro manifest files. It's especially expensive on object stores, though 
manifest caching can help here.

Currently we invoke this API two times during table loading (via 
IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in 
loadPartitionStats().

We should just invoke IcebergUtil.getIcebergFiles() once, then pass the result 
object to loadAllPartition() and loadPartitionStats().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to