[jira] [Commented] (HADOOP-19348) S3A: Add initial support for analytics-accelerator-s3

ASF GitHub Bot (Jira) Mon, 16 Dec 2024 15:56:06 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906223#comment-17906223
 ]


ASF GitHub Bot commented on HADOOP-19348:
-----------------------------------------

mukund-thakur commented on code in PR #7192:
URL: https://github.com/apache/hadoop/pull/7192#discussion_r1887721548


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java:
##########
@@ -45,6 +46,12 @@ public class ITestS3AContractRename extends 
AbstractContractRenameTest {
   public static final Logger LOG = LoggerFactory.getLogger(
       ITestS3AContractRename.class);
 
+  @Override
+  public void setup() throws Exception {
+    super.setup();
+    skipIfAnalyticsAcceleratorEnabled(getContract().getConf());

Review Comment:
   Rather than skipping, can we not configure to run all these tests with 
S3ASeekableStream S3AInputStream ?That will lead to huge test coverage for the 
new read flow. 





> S3A: Add initial support for analytics-accelerator-s3
> -----------------------------------------------------
>
>                 Key: HADOOP-19348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19348
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.2
>            Reporter: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>
> S3 recently released [Analytics Accelerator Library for Amazon 
> S3|https://github.com/awslabs/analytics-accelerator-s3] as an Alpha release, 
> which is an input stream, with an initial goal of improving performance for 
> Apache Spark workloads on Parquet datasets. 
> For example, it implements optimisations such as footer prefetching, and so 
> avoids the multiple GETS S3AInputStream currently makes for the footer bytes 
> and PageIndex structures.
> The library also tracks columns currently being read by a query using the 
> parquet metadata, and then prefetches these bytes when parquet files with the 
> same schema are opened. 
> This ticket tracks the work required for the basic initial integration. There 
> is still more work to be done, such as VectoredIO support etc, which we will 
> identify and follow up with. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19348) S3A: Add initial support for analytics-accelerator-s3

Reply via email to