Steve Loughran commented on HADOOP-15364:

Patch 002
# addresses checkstyle & javadocs
# the {{-out}} option is moved from {{-out <file>}} to {{-ouit <path>}}

Test: S3 ireland.

item #2 is cute as you can now do cross infra ETL operations like
hadoop s3guard select -header use -out adl://store/datasets-18 
s3a://datasets/allyears.csv "select * from S3 where s.year = `2018`"

though it'll lose the header on the way through, which isn't ideal, it being 
the closest CSV files have to a schema.

I actually considered having an  {{-avro schema}} option to save the data as 
avro instead: after all, avro is on the classpath already. But that's a bit of 
feature creep. If it were done, a format-independent plugin point would be more 
flexible. Not on my TODO list. 

> Add support for S3 Select to S3A
> --------------------------------
>                 Key: HADOOP-15364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15364
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15364-001.patch, HADOOP-15364-002.patch
> Expect a PoC patch for this in a couple of days; 
> * it'll depend on an SDK update to work, plus a couple of of other minor 
> changes
> * Adds command line option too 
> {code}
> hadoop s3guard select -header use -compression gzip -limit 100 
> s3a://landsat-pds/scene_list.gz" \
> "SELECT s.entityId FROM S3OBJECT s WHERE s.cloudCover = '0.0' "
> {code}
> For wider use we'll need to implement the HADOOP-15229 so that callers can 
> pass down the expression along with any other parameters

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to