Fu Lili created IMPALA-13254:
--------------------------------

             Summary: Optimizing incremental reload performance of Iceberg 
tables
                 Key: IMPALA-13254
                 URL: https://issues.apache.org/jira/browse/IMPALA-13254
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
    Affects Versions: Impala 4.4.0
            Reporter: Fu Lili
            Assignee: Fu Lili


When performing a {{REFRESH}} on an Iceberg table, if the number of changed 
files exceeds the {{iceberg_reload_new_files_threshold}} configuration (default 
is 100), a highly inefficient reload operation is triggered.

The main issue with this code lies in the 
{{IcebergFileMetadataLoader.getFileStatuses}} function. During incremental 
loading, the {{listWithLocations}} parameter is always set to {{{}false{}}}, 
resulting in {{fs.getFileStatus}} and {{fs.getFileBlockLocations}} operations 
being performed on each {{contentFile}} sequentially (if the filesystem 
supports {{{}StorageIds{}}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to