[ 
https://issues.apache.org/jira/browse/IMPALA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864781#comment-17864781
 ] 

Steve Loughran commented on IMPALA-11752:
-----------------------------------------

also: support the alternative option of mapping s3 to s3a. I know there is 
consensual view that s3 file IO is superior to s3afs, but I am very confident 
my code has encountered more errors and therefore has more error handling -even 
though the move to the AWS v2 SDK is still finding new regressions.

> Handle s3:// paths in Iceberg tables
> ------------------------------------
>
>                 Key: IMPALA-11752
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11752
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>
> Components using 
> [S3FileIO|https://iceberg.apache.org/docs/latest/aws/#s3-fileio] might write 
> out file paths starting with 's3://' instead of 's3a://'. The latter is used 
> by 
> [HadoopFileIO|https://iceberg.apache.org/docs/latest/aws/#hadoop-s3a-filesystem]
>  that Impala is using.
> By default, HadoopFileIO doesn't interpret paths starting with 's3://'. 
> (Probably this could be resolved by setting "fs.s3.impl" to 
> "org.apache.hadoop.fs.s3a.S3AFileSystem" so that an s3a fs instance is 
> created)
> [FeIcebergTable.Utils.FeIcebergTable()|https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java#L671-L689]
>  depends on file paths returned by recursive file listing match the file 
> paths in Iceberg metadata files. But the recursive listing returns s3a:// 
> paths, while metadata contains s3:// paths, which means we'll load files 
> one-by-one as we won't find the files in the hash map 'hdfsFileDescMap'.
> Moreover, if position delete file processing is also based on exact matches 
> of the file URIs. Therefore if entries with s3:// paths won't have the 
> desired effects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to