[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207733#comment-16207733
 ] 

Steve Loughran commented on HADOOP-13786:
-----------------------------------------

I recognise the fear of long-lived branches, and we  can probably get away with 
incremental s3guard stuff from now now. This committer is mostly done apart 
from some docs & tuning of related bits. The sole current diff since this patch 
and my local source is a new {{AWSStatus500Exception}} and treating a 500 
response as retriable, even on ops considered non-idempotent, and some more 
docs on mapreduce task commit/abort.

For review, you can split off looking at 

# the retry logic: {{Invoker}}, {{S3ARetryPolicy}} & how that's being used to 
wrap operations in S3AFileSystem, a refactored WriteOperationsHelper and 
DynamoDBMetadataStore. Is the model write (retry-around-closures), is the retry 
policy good, and is it being used correctly, 
# Changes to S3ABlockOutputStream to let us control whether its delayed 
complete or not, & changes to S3AFS to recognise the special paths so switch 
policy
# CommitOperations: the underlying integration with the FS to save/restore 
lists of PUTs to complete, operations to commit them

Finally, the committer, looking at {{AbstractS3GuardCommitter}} , 
{{StagingCommitter}} (Ryan's), and {{MagicS3GuardCommitter}} which is the one 
using the special output streams. They all use the same bindings to the FS and 
JSON file formats, so differ in: where work goes, how the commit metadata is 
passed to the job committer. And for the staging committer, the conflict 
policies of the two public implementations, :Directory and Partitioned.

The test {{AbstractITCommitProtocol}} is the one which pushes the commit 
protocol through its lifecycle, trying to recreate the valid & failure 
workflows. That's inevitably where there's scope to cover all the corner 
cases...I think I'll look again at speculation there.

Finally, new docs, including one on [committer 
architecture|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md].
 That covers what the MR commit protocol is, which is something you need to 
understand before looking at the commit internals. That doc is probably the 
most complete discussion on the topic there is, and even it avoids bits I don't 
understand (Preemption)

I can give  talk on this stuff on wednesday or thursday AM PST if people want

> Add S3Guard committer for zero-rename commits to S3 endpoints
> -------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, 
> HADOOP-13786-038.patch, HADOOP-13786-039.patch, 
> HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, 
> HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, 
> HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, 
> HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, 
> HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, 
> HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, 
> HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, 
> HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, 
> HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, 
> HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, 
> HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, 
> HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, 
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, 
> HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, 
> HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, 
> HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, 
> cloud-intergration-test-failure.log, objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to