[
https://issues.apache.org/jira/browse/IMPALA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noemi Pap-Takacs updated IMPALA-11752:
--------------------------------------
Issue Type: New Feature (was: Bug)
> Handle s3:// paths in Iceberg tables
> ------------------------------------
>
> Key: IMPALA-11752
> URL: https://issues.apache.org/jira/browse/IMPALA-11752
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend, Frontend
> Reporter: Zoltán Borók-Nagy
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: impala-iceberg
>
> Components using
> [S3FileIO|https://iceberg.apache.org/docs/latest/aws/#s3-fileio] might write
> out file paths starting with 's3://' instead of 's3a://'. The latter is used
> by
> [HadoopFileIO|https://iceberg.apache.org/docs/latest/aws/#hadoop-s3a-filesystem]
> that Impala is using.
> By default, HadoopFileIO doesn't interpret paths starting with 's3://'.
> (Probably this could be resolved by setting "fs.s3.impl" to
> "org.apache.hadoop.fs.s3a.S3AFileSystem" so that an s3a fs instance is
> created)
> [FeIcebergTable.Utils.FeIcebergTable()|https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java#L671-L689]
> depends on file paths returned by recursive file listing match the file
> paths in Iceberg metadata files. But the recursive listing returns s3a://
> paths, while metadata contains s3:// paths, which means we'll load files
> one-by-one as we won't find the files in the hash map 'hdfsFileDescMap'.
> Moreover, if position delete file processing is also based on exact matches
> of the file URIs. Therefore if entries with s3:// paths won't have the
> desired effects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]