[
https://issues.apache.org/jira/browse/IMPALA-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-12299 started by Peter Rozsa.
--------------------------------------------
> Parallelize file listings of Iceberg tables on HDFS/Ozone
> ---------------------------------------------------------
>
> Key: IMPALA-12299
> URL: https://issues.apache.org/jira/browse/IMPALA-12299
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Zoltán Borók-Nagy
> Assignee: Peter Rozsa
> Priority: Major
> Labels: impala-iceberg, performance
>
> *The followings only affect HDFS/Ozone where we need to contact the NameNode
> to create file descriptors with block locations. On cloud object stores where
> there are no block locations, we only need the Iceberg metadata to create the
> file descriptors.*
> Currently we are doing one big recursive file listing on the table directory
> to load all the files (with block locations as well) in an Iceberg table.
> Instead of this, we could look at the Iceberg metadata, identify the
> partitions, then load the file descriptors in them in parallel.
> We cannot really reuse ParallelFileMetadataLoader in its current form as it
> works on HdfsPartitions, and Iceberg tables are treated as non-partitioned
> tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden
> from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]