[
https://issues.apache.org/jira/browse/HADOOP-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460077#comment-16460077
]
Steve Loughran commented on HADOOP-15364:
-----------------------------------------
exactly. The various other options (line endings, quotes, header, etc) would
also be things that you'd declare as being needed. Though the strings would all
be explicit s3a: prefixed to avoid conflict with any standard ones, e.g
"s3a:select.query", "s3a.select.header", ....
Also useful for general IO perf, so that in the ORC/Parquet code they could go
fs.openFile("/data/2017/10/12/data.orc").opt("fs.fadvise", "random"), so hint
that you'd intend for random IO & so the IO policy of the stream should plan
for it (no more GETs to end of content, etc). Because its most critical in a
few libraries (those two, in particular), we don't need broader takeup for this
to have tangible benefits for many
> Add support for S3 Select to S3A
> --------------------------------
>
> Key: HADOOP-15364
> URL: https://issues.apache.org/jira/browse/HADOOP-15364
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15364-001.patch, HADOOP-15364-002.patch
>
>
> Expect a PoC patch for this in a couple of days;
> * it'll depend on an SDK update to work, plus a couple of of other minor
> changes
> * Adds command line option too
> {code}
> hadoop s3guard select -header use -compression gzip -limit 100
> s3a://landsat-pds/scene_list.gz" \
> "SELECT s.entityId FROM S3OBJECT s WHERE s.cloudCover = '0.0' "
> {code}
> For wider use we'll need to implement the HADOOP-15229 so that callers can
> pass down the expression along with any other parameters
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]