[
https://issues.apache.org/jira/browse/IMPALA-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-12298.
----------------------------------------
Fix Version/s: Impala 4.3.0
Resolution: Fixed
> Improve incremental load of Iceberg tables
> ------------------------------------------
>
> Key: IMPALA-12298
> URL: https://issues.apache.org/jira/browse/IMPALA-12298
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg, performance
> Fix For: Impala 4.3.0
>
>
> *The followings mostly affect HDFS/Ozone where we need to contact the
> NameNode to create file descriptors with block locations. On cloud object
> stores where there are no block locations, we only need the Iceberg metadata
> to create the file descriptors.*
> Currently we always reload all the metadata belonging to an Iceberg table.
> This means we recreate all the file descriptors even if only a few of them
> have changed.
> We could check the amount of the newly added files, and if there's only a few
> of them then we should only load the file descriptors for those one by one.
> We can fallback to a full reload if a significant amount of files have
> changed, i.e. when it is better to use a recursive file listing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]