LiPenglin created IMPALA-11662:
----------------------------------

             Summary: Improve "refresh iceberg_tbl_on_oss;" performance
                 Key: IMPALA-11662
                 URL: https://issues.apache.org/jira/browse/IMPALA-11662
             Project: IMPALA
          Issue Type: Improvement
            Reporter: LiPenglin


Since Iceberg provides rich metadata, the cost of directory listing on OSS 
service e.g. S3A is higher than the cost on HDFS, we could create the file 
descriptors from Iceberg metadata instead of using 
org.apache.hadoop.fs.FileSystem#listFiles. 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L189.
The only thing missing there is the last_modification_time of the files. But 
since Iceberg files are immutable, maybe we could just come up with a special 
timestamp for these files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to