[jira] [Comment Edited] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Mon, 13 Mar 2017 14:59:06 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923030#comment-15923030
 ]


Steve Loughran edited comment on HADOOP-13786 at 3/13/17 9:58 PM:
------------------------------------------------------------------

patch 012:
* UUID reinstated by default; you can turn this off.
* mostly all the unit tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, 
{{TestStagingDirectoryOutputCommitter}}  are having an unexpected delete in 
mock invocations (good? bad?), and because the staging committer doesn't yet 
handle directories, some of the protocol tests are failing.
* Added {{AbstractITCommitProtocol}} subclasses for the directory and partition 
committers
* Those in the protocol IT tests which are failing on the job/commit fail are 
failing as the tests aren't looking for the right exception.

One thing to highlight here is that when running these tests from my desktop, 
the staging commits seem to be faster than the magic ones. Why? A lot less S3 
communication over a long haul link during task setup/commit, and there's no 
real data to upload, so the cost delaying the upload until the task commit is 
negligible.


was (Author: [email protected]):
patch 012:  
* mostly all the tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, 
{{TestStagingDirectoryOutputCommitter}}  are having an unexpected delete in 
mock invocations (good? bad?), and because the staging committer doesn't yet 
handle directories, some of the protocol tests are failing.

One thing to highlight here is that when running these tests from my desktop, 
the staging commits seem to be faster than the magic ones. Why? A lot less S3 
communication over a long haul link during task setup/commit, and there's no 
real data to upload, so the cost delaying the upload until the task commit is 
negligible.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch, 
> HADOOP-13786-HADOOP-13345-012.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to