[
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207733#comment-16207733
]
Steve Loughran commented on HADOOP-13786:
-----------------------------------------
I recognise the fear of long-lived branches, and we can probably get away with
incremental s3guard stuff from now now. This committer is mostly done apart
from some docs & tuning of related bits. The sole current diff since this patch
and my local source is a new {{AWSStatus500Exception}} and treating a 500
response as retriable, even on ops considered non-idempotent, and some more
docs on mapreduce task commit/abort.
For review, you can split off looking at
# the retry logic: {{Invoker}}, {{S3ARetryPolicy}} & how that's being used to
wrap operations in S3AFileSystem, a refactored WriteOperationsHelper and
DynamoDBMetadataStore. Is the model write (retry-around-closures), is the retry
policy good, and is it being used correctly,
# Changes to S3ABlockOutputStream to let us control whether its delayed
complete or not, & changes to S3AFS to recognise the special paths so switch
policy
# CommitOperations: the underlying integration with the FS to save/restore
lists of PUTs to complete, operations to commit them
Finally, the committer, looking at {{AbstractS3GuardCommitter}} ,
{{StagingCommitter}} (Ryan's), and {{MagicS3GuardCommitter}} which is the one
using the special output streams. They all use the same bindings to the FS and
JSON file formats, so differ in: where work goes, how the commit metadata is
passed to the job committer. And for the staging committer, the conflict
policies of the two public implementations, :Directory and Partitioned.
The test {{AbstractITCommitProtocol}} is the one which pushes the commit
protocol through its lifecycle, trying to recreate the valid & failure
workflows. That's inevitably where there's scope to cover all the corner
cases...I think I'll look again at speculation there.
Finally, new docs, including one on [committer
architecture|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md].
That covers what the MR commit protocol is, which is something you need to
understand before looking at the commit internals. That doc is probably the
most complete discussion on the topic there is, and even it avoids bits I don't
understand (Preemption)
I can give talk on this stuff on wednesday or thursday AM PST if people want
> Add S3Guard committer for zero-rename commits to S3 endpoints
> -------------------------------------------------------------
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch,
> HADOOP-13786-038.patch, HADOOP-13786-039.patch,
> HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch,
> HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch,
> HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch,
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch,
> HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch,
> HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch,
> HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch,
> HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch,
> HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch,
> HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch,
> HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch,
> HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch,
> HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch,
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch,
> HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch,
> HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch,
> HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch,
> cloud-intergration-test-failure.log, objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the
> presence of failures". Implement it, including whatever is needed to
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard
> provides a consistent view of the presence/absence of blobs, show that we can
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output
> streams (ie. not visible until the close()), if we need to use that to allow
> us to abort commit operations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]