Zoltán Borók-Nagy created IMPALA-11784:
------------------------------------------
Summary: Don't unnecessarily call Iceberg's planFiles() during
table loading
Key: IMPALA-11784
URL: https://issues.apache.org/jira/browse/IMPALA-11784
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Zoltán Borók-Nagy
Iceberg's planFiles() API is very expensive because it involves reading the
Avro manifest files. It's especially expensive on object stores, though
manifest caching can help here.
Currently we invoke this API two times during table loading (via
IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in
loadPartitionStats().
We should just invoke IcebergUtil.getIcebergFiles() once, then pass the result
object to loadAllPartition() and loadPartitionStats().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)