Fu Lili created IMPALA-13254:
--------------------------------
Summary: Optimizing incremental reload performance of Iceberg
tables
Key: IMPALA-13254
URL: https://issues.apache.org/jira/browse/IMPALA-13254
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Affects Versions: Impala 4.4.0
Reporter: Fu Lili
Assignee: Fu Lili
When performing a {{REFRESH}} on an Iceberg table, if the number of changed
files exceeds the {{iceberg_reload_new_files_threshold}} configuration (default
is 100), a highly inefficient reload operation is triggered.
The main issue with this code lies in the
{{IcebergFileMetadataLoader.getFileStatuses}} function. During incremental
loading, the {{listWithLocations}} parameter is always set to {{{}false{}}},
resulting in {{fs.getFileStatus}} and {{fs.getFileBlockLocations}} operations
being performed on each {{contentFile}} sequentially (if the filesystem
supports {{{}StorageIds{}}}).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)