Zoltán Borók-Nagy created IMPALA-12298:
------------------------------------------

             Summary: Improve incremental load of Iceberg tables
                 Key: IMPALA-12298
                 URL: https://issues.apache.org/jira/browse/IMPALA-12298
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Zoltán Borók-Nagy


*The followings mostly affect HDFS/Ozone where we need to contact the NameNode 
to create file descriptors with block locations. On cloud object stores where 
there are no block locations, we only need the Iceberg metadata to create the 
file descriptors.*

Currently we always reload all the metadata belonging to an Iceberg table.
This means we recreate all the file descriptors even if only a few of them have 
changed.

We could check the amount of the newly added files, and if there's only a few 
of them then we should only load the file descriptors for those one by one.

We can fallback to a full reload if a significant amount of files have changed, 
i.e. when it is better to use a recursive file listing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to