[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15229:
------------------------------------
       Resolution: Fixed
    Fix Version/s: 3.3.0
           Status: Resolved  (was: Patch Available)

OK this is done.

Thank you to all the reviewers, especially those new to the project who passed 
on their experience.

For proper support of S3 select through input formats, see MAPREDUCE-7182. Note 
that this could also address S3 CSE where again, the input data is less than 
the output data, and this fact not known until the file is opened (see 
HADOOP-13887). 

For all the people watching it; I've got too many other things on my TODO lists 
to work on this; short term I'm doing stuff with ABFS & DTs, wrapping up 
S3Guard corner cases and then doing the vectored read of HADOOP-11867 -as well 
as catch up on various S3A patches which I'd been ignoring pending this one 
getting in. 

h2. Call for Contributions

If you have any code in the MR side of things to help address MAPREDUCE-7182, 
including test cases (esp: failure conditions happening in the second or later 
page of s3 select responses), please —submit your contributions under that 
patch. Thanks

> Add FileSystem builder-based openFile() API to match createFile(); S3A to 
> implement S3 Select through this API.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15229
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch, HADOOP-15229-016.patch, HADOOP-15229-017.patch, 
> HADOOP-15229-018.patch, HADOOP-15229-019.patch, HADOOP-15229-020.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to