[jira] [Commented] (HADOOP-19348) S3A: Add initial support for analytics-accelerator-s3

ASF GitHub Bot (Jira) Tue, 04 Feb 2025 05:48:05 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923711#comment-17923711
 ]


ASF GitHub Bot commented on HADOOP-19348:
-----------------------------------------

ahmarsuhail commented on PR #7334:
URL: https://github.com/apache/hadoop/pull/7334#issuecomment-2634021728

   Few things to discuss here:
   
   * Now that we're using S3A's async client, which already has the execution 
interceptors attached, a lot of tests fail as out of span operations get 
rejected. Since we're not support auditing right now, can we recommend that if 
you're running with AAL turned on, turn off 
`fs.s3a.audit.reject.out.of.span.operations`?
   
   * The async client from the current SDK version doesn't do ranged GETs if 
`multipartEnabled` is enabled on it. For ranged GETs, either upgrade SDK or 
disable multipartEnabled temporary when AAL is enabled, similar to 
https://github.com/apache/hadoop/blob/950b3eb431de99885521095efc4cc65ee9252db7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L169




> S3A: Add initial support for analytics-accelerator-s3
> -----------------------------------------------------
>
>                 Key: HADOOP-19348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19348
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.2
>            Reporter: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>
> S3 recently released [Analytics Accelerator Library for Amazon 
> S3|https://github.com/awslabs/analytics-accelerator-s3] as an Alpha release, 
> which is an input stream, with an initial goal of improving performance for 
> Apache Spark workloads on Parquet datasets. 
> For example, it implements optimisations such as footer prefetching, and so 
> avoids the multiple GETS S3AInputStream currently makes for the footer bytes 
> and PageIndex structures.
> The library also tracks columns currently being read by a query using the 
> parquet metadata, and then prefetches these bytes when parquet files with the 
> same schema are opened. 
> This ticket tracks the work required for the basic initial integration. There 
> is still more work to be done, such as VectoredIO support etc, which we will 
> identify and follow up with. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19348) S3A: Add initial support for analytics-accelerator-s3

Reply via email to