Zoltán Borók-Nagy created IMPALA-12299:
------------------------------------------

             Summary: Parallelize file listings of Iceberg tables on HDFS/Ozone
                 Key: IMPALA-12299
                 URL: https://issues.apache.org/jira/browse/IMPALA-12299
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Zoltán Borók-Nagy


*The followings only affect HDFS/Ozone where we need to contact the NameNode to 
create file descriptors with block locations. On cloud object stores where 
there are no block locations, we only need the Iceberg metadata to create the 
file descriptors.*

Currently we are doing one big recursive file listing on the table directory to 
load all the files (with block locations as well) in an Iceberg table.

Instead of this, we could look at the Iceberg metadata, identify the 
partitions, then load the file descriptors in them in parallel.

We cannot really reuse ParallelFileMetadataLoader in its current form as it 
works on HdfsPartitions, and Iceberg tables are treated as non-partitioned 
tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden 
from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to