[
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905498#comment-15905498
]
Ryan Blue commented on HADOOP-13786:
------------------------------------
For the staging committer drawbacks, I think there's a clear path to avoid them.
The committer is not intended to instantiate its own S3Client. It does for
testing, but when it is integrated with S3A it should be passed a configured
client when it is instantiated, or should use package-local access to get one
from the S3A FS object. In other words, the default {{findClient}} method
shouldn't be used; we don't use it other than for testing. My intent was for
S3A to have a {{FileSystem#newOutputCommitter(Path, JobContext)}} factory
method. That way, the FS can pass its internal S3 client instead of
instantiating two.
The storage on local disk isn't a requirement. We can replace that with an
output stream that buffers in memory and sends parts to S3 when they are ready
(we're planning on doing this eventually). This is just waiting on a stable API
to rely on that can close a stream, but not commit data. Since the committer
API right now expects tasks to create files underneath the work path, we'll
have to figure out how tasks can get a multi-part stream that is committed
later without using a different method.
We can also pass in a thread-pool if there is a better one to use. I think this
is separate enough that it should be easy.
> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch,
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch,
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch,
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch,
> HADOOP-13786-HADOOP-13345-010.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the
> presence of failures". Implement it, including whatever is needed to
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard
> provides a consistent view of the presence/absence of blobs, show that we can
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output
> streams (ie. not visible until the close()), if we need to use that to allow
> us to abort commit operations.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]