[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Wed, 08 Mar 2017 14:17:35 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-009.patch

Patch 009; pulling in Ryan's netflix committer; I've called in "the Staging 
committer", with the one playing games in S3a "the magic committer"; that one 
I've actually put to one side as an "eventually" feature; the goal being: 
common code & data underneath, but different strategies of getting the data up.

Details at the end of: 
https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3a_committer.md

It uses the existing FileOutputCommitter for its work; some time was spent in a 
debugger until I worked out that the code relies on Algorithm 1, not the 
current default of "2". Ideally I'd rip out the FOC altogether, it's just a 
complication once each task only generates a single file whose name can be 
guaranteed to be unique. Task commits copy into the job dir; job commits rename 
to dest, files are scanned and used to generate the multipart commits for the 
real dest. Leaving alone for now.

* All of Ryan's tests are failing as the FS mocking is being rejected "not an 
s3a filesystem"'; more migration work needed there. 
* I've been working on the protocol integration test copied out from the 
mapreduce module; this tests various sequences of the operations and asserts 
about final outcomes. Some of them are starting to work (core commit and 
abort), but not those with subdirectories, such as the intermediate Map output.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to