Zoltán Borók-Nagy created IMPALA-12299:
------------------------------------------
Summary: Parallelize file listings of Iceberg tables on HDFS/Ozone
Key: IMPALA-12299
URL: https://issues.apache.org/jira/browse/IMPALA-12299
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Zoltán Borók-Nagy
*The followings only affect HDFS/Ozone where we need to contact the NameNode to
create file descriptors with block locations. On cloud object stores where
there are no block locations, we only need the Iceberg metadata to create the
file descriptors.*
Currently we are doing one big recursive file listing on the table directory to
load all the files (with block locations as well) in an Iceberg table.
Instead of this, we could look at the Iceberg metadata, identify the
partitions, then load the file descriptors in them in parallel.
We cannot really reuse ParallelFileMetadataLoader in its current form as it
works on HdfsPartitions, and Iceberg tables are treated as non-partitioned
tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden
from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)