[ https://issues.apache.org/jira/browse/IMPALA-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Rozsa reassigned IMPALA-12299: ------------------------------------ Assignee: Peter Rozsa > Parallelize file listings of Iceberg tables on HDFS/Ozone > --------------------------------------------------------- > > Key: IMPALA-12299 > URL: https://issues.apache.org/jira/browse/IMPALA-12299 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Zoltán Borók-Nagy > Assignee: Peter Rozsa > Priority: Major > Labels: impala-iceberg > > *The followings only affect HDFS/Ozone where we need to contact the NameNode > to create file descriptors with block locations. On cloud object stores where > there are no block locations, we only need the Iceberg metadata to create the > file descriptors.* > Currently we are doing one big recursive file listing on the table directory > to load all the files (with block locations as well) in an Iceberg table. > Instead of this, we could look at the Iceberg metadata, identify the > partitions, then load the file descriptors in them in parallel. > We cannot really reuse ParallelFileMetadataLoader in its current form as it > works on HdfsPartitions, and Iceberg tables are treated as non-partitioned > tables in the Impala Catalog, i.e. the actual Iceberg partitions are hidden > from it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org