[
https://issues.apache.org/jira/browse/IMPALA-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078367#comment-18078367
]
ASF subversion and git services commented on IMPALA-14804:
----------------------------------------------------------
Commit 042b915c9ec7feb0398bdec84027e908ada59725 in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=042b915c9 ]
IMPALA-14804: Improve incremental refresh of Iceberg tables
Instead of getting all files in the table with planFiles() only
check files added in the snapshots since the last loaded one.
This is valid only if all changes are appends - if there are
deletes or compactions (replace), then the old logic is used.
If the snapshot didn't change (e.g. table property change),
then the whole file loading is skipped.
Note that getting file statuses from FS was already optimized
by reusing file descriptors of already loaded files.
The change makes it much faster to refresh tables with many files
after small changes. Example:
~1M file table, 25K partitions, inserting 1 file
"Loaded file and block metadata": 5s->0.02s
Change-Id: Ife1ebd2f054fdab7b96487091feac8cbd5b5cdc0
Generated-by: Claude Sonnet 4.6
Reviewed-on: http://gerrit.cloudera.org:8080/24210
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Improve incremental updates of Iceberg tables
> ---------------------------------------------
>
> Key: IMPALA-14804
> URL: https://issues.apache.org/jira/browse/IMPALA-14804
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Critical
>
> Currently all files are taken from Iceberg and compared to old file
> descriptors based on hash. This avoids reloading file descriptors, but can be
> still very slow if the table is large.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]