Zoltán Borók-Nagy created IMPALA-12298:
------------------------------------------
Summary: Improve incremental load of Iceberg tables
Key: IMPALA-12298
URL: https://issues.apache.org/jira/browse/IMPALA-12298
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Zoltán Borók-Nagy
*The followings mostly affect HDFS/Ozone where we need to contact the NameNode
to create file descriptors with block locations. On cloud object stores where
there are no block locations, we only need the Iceberg metadata to create the
file descriptors.*
Currently we always reload all the metadata belonging to an Iceberg table.
This means we recreate all the file descriptors even if only a few of them have
changed.
We could check the amount of the newly added files, and if there's only a few
of them then we should only load the file descriptors for those one by one.
We can fallback to a full reload if a significant amount of files have changed,
i.e. when it is better to use a recursive file listing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)