[ 
https://issues.apache.org/jira/browse/IMPALA-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Rozsa reassigned IMPALA-12299:
------------------------------------

    Assignee: Peter Rozsa

> Parallelize file listings of Iceberg tables on HDFS/Ozone
> ---------------------------------------------------------
>
>                 Key: IMPALA-12299
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12299
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Peter Rozsa
>            Priority: Major
>              Labels: impala-iceberg
>
> *The followings only affect HDFS/Ozone where we need to contact the NameNode 
> to create file descriptors with block locations. On cloud object stores where 
> there are no block locations, we only need the Iceberg metadata to create the 
> file descriptors.*
> Currently we are doing one big recursive file listing on the table directory 
> to load all the files (with block locations as well) in an Iceberg table.
> Instead of this, we could look at the Iceberg metadata, identify the 
> partitions, then load the file descriptors in them in parallel.
> We cannot really reuse ParallelFileMetadataLoader in its current form as it 
> works on HdfsPartitions, and Iceberg tables are treated as non-partitioned 
> tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden 
> from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to