[ 
https://issues.apache.org/jira/browse/IMPALA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs updated IMPALA-11752:
--------------------------------------
    Issue Type: New Feature  (was: Bug)

> Handle s3:// paths in Iceberg tables
> ------------------------------------
>
>                 Key: IMPALA-11752
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11752
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>
> Components using 
> [S3FileIO|https://iceberg.apache.org/docs/latest/aws/#s3-fileio] might write 
> out file paths starting with 's3://' instead of 's3a://'. The latter is used 
> by 
> [HadoopFileIO|https://iceberg.apache.org/docs/latest/aws/#hadoop-s3a-filesystem]
>  that Impala is using.
> By default, HadoopFileIO doesn't interpret paths starting with 's3://'. 
> (Probably this could be resolved by setting "fs.s3.impl" to 
> "org.apache.hadoop.fs.s3a.S3AFileSystem" so that an s3a fs instance is 
> created)
> [FeIcebergTable.Utils.FeIcebergTable()|https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java#L671-L689]
>  depends on file paths returned by recursive file listing match the file 
> paths in Iceberg metadata files. But the recursive listing returns s3a:// 
> paths, while metadata contains s3:// paths, which means we'll load files 
> one-by-one as we won't find the files in the hash map 'hdfsFileDescMap'.
> Moreover, if position delete file processing is also based on exact matches 
> of the file URIs. Therefore if entries with s3:// paths won't have the 
> desired effects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to