[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Wed, 03 May 2017 11:16:31 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-025.patch

Patch 026; 
* rebased to HADOOP-13345
* the {{_SUCCESS}} marker file is now a JSON manifest file listing: committer, 
host, timestamp, files committed

The success file lets us verify that the new committer was actually used (guess 
what: it wasn't), and we can add extra assertions on it. This is the toString 
value from a test run

{code}
SuccessData{
  committer='PartitionedStagingCommitter',
  hostname='devbox.local',
  description='Task committer attempt_1493832493956_0001_m_000000_0',
  date='Wed May 03 18:28:41 BST 2017',
  filenames=[test/testMRJob/part-m-00000, test/testMRJob/part-m-00002, 
test/testMRJob/part-m-00001]
}
{code}

We could actually use this to collect low-level timing info on operations, 
retry counts, etc, even where the execution engine doesn't: something which 
could be used for diagnostics, perhaps. Checking the listed filenames with the 
actual set of files in the directory would be a way of verifying that the work 
is all visible.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch, 
> HADOOP-13786-HADOOP-13345-012.patch, HADOOP-13786-HADOOP-13345-013.patch, 
> HADOOP-13786-HADOOP-13345-015.patch, HADOOP-13786-HADOOP-13345-016.patch, 
> HADOOP-13786-HADOOP-13345-017.patch, HADOOP-13786-HADOOP-13345-018.patch, 
> HADOOP-13786-HADOOP-13345-019.patch, HADOOP-13786-HADOOP-13345-020.patch, 
> HADOOP-13786-HADOOP-13345-021.patch, HADOOP-13786-HADOOP-13345-022.patch, 
> HADOOP-13786-HADOOP-13345-023.patch, HADOOP-13786-HADOOP-13345-024.patch, 
> HADOOP-13786-HADOOP-13345-025.patch, objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to