[ 
https://issues.apache.org/jira/browse/ARROW-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286849#comment-17286849
 ] 

Will Jones commented on ARROW-11558:
------------------------------------

I am also interested to see what situations s3 select is beneficial.

However, I wonder which parts do you think should be part of the Arrow library?

>From what I can tell, the S3 Select endpoints give back a stream of JSON or 
>CSV, which you could probably deserialize with the existing Arrow JSON and CSV 
>readers. So this might be functionality you could build _using_ Arrow, rather 
>than need to build _into_ Arrow. In fact, much of this might be more 
>appropriate to have in [AWS Data 
>Wrangler|https://github.com/awslabs/aws-data-wrangler], which already uses the 
>Arrow library for reading parquet from S3.

> [C++] Push down projection and selection to S3 Select
> -----------------------------------------------------
>
>                 Key: ARROW-11558
>                 URL: https://issues.apache.org/jira/browse/ARROW-11558
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ian Cook
>            Priority: Major
>              Labels: filesystem
>
> Amazon S3 Select [1], an S3 feature generally available since April 2018 [2], 
> can improve S3 read performance by allowing S3 clients to use a limited 
> subset of SQL to specify projection and selection [3] on data in some formats 
> [4]. It would be interesting to try using this in Arrow and to measure its 
> effects on S3 read performance under various conditions.
> [1] [https://aws.amazon.com/blogs/aws/s3-glacier-select/]
> [2] 
> [https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/]
> [3] 
> [https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-select.html]
> [4][https://docs.aws.amazon.com/cli/latest/reference/s3api/select-object-content.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to