[
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903528#comment-15903528
]
Steve Loughran commented on HADOOP-13786:
-----------------------------------------
Right now I'm doing this out of an ASF branch, rebasing onto the HADOOP-13345
branch regularly. That way I can do things without adult supervision
https://github.com/steveloughran/hadoop/tree/s3guard/HADOOP-13786-committer
Test wise, I've been focusing on the integration tests
{{org.apache.hadoop.fs.s3a.commit.staging.ITestStagingCommitProtocol}}, which
is derived from
{{org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter}} ; because
they're from there they form part of the expectations of the protocol
implementation for commitment
(more precisely, they're the closest we have to any definition). All these
intiial tests work, except for those generating files in a subdirectory, e.g.
{{part-0000/subfile}}; something critical for the intermediate output of an MR
job.
The scanner for files to upload is just doing a flat list and then getting into
trouble when it gets handed a directory to upload instead of a simple file.
{code}
java.io.FileNotFoundException: Not a
file/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/tmp/mapred/local/job_200707121733_0001/_temporary/0/_temporary/attempt_200707121733_0001_m_000000_0/part-m-00000
at
org.apache.hadoop.fs.s3a.commit.staging.S3Util.multipartUpload(S3Util.java:104)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter$8.run(StagingS3GuardCommitter.java:784)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter$8.run(StagingS3GuardCommitter.java:771)
at
org.apache.hadoop.fs.s3a.commit.staging.Tasks$Builder.runSingleThreaded(Tasks.java:122)
at
org.apache.hadoop.fs.s3a.commit.staging.Tasks$Builder.run(Tasks.java:108)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter.commitTaskInternal(StagingS3GuardCommitter.java:771)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter.commitTask(StagingS3GuardCommitter.java:716)
at
org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol.commit(AbstractITCommitProtocol.java:684)
at
org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol.testMapFileOutputCommitter(AbstractITCommitProtocol.java:567)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}
What I need to do is go from the flat listFiles() to the recursive one, then
use that to create the offset of the final destination.
Regarding the mock tests, it's all happening because bits of the code now
expect and S3aFS, and its only a base FileSystem, with a limited set of
operations, being mocked
It's not going to be quite enough to mock S3AFS, unless those extra methods
come in (which will surface as we try).
FWIW, I'd actually prefer that, wherever possible, real integration tests were
used over mock ones. Yes, they are slower, no, yetus and jenkins don't run
them, but because they really do test the endpoint, and will catch regressions
in the S3A client itself, quirks in different S3 implementations, etc. Given
you've written them, it'll be good to get working. And the fault generation is
great to test the resilience of the committer.
> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch,
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch,
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch,
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch,
> s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the
> presence of failures". Implement it, including whatever is needed to
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard
> provides a consistent view of the presence/absence of blobs, show that we can
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output
> streams (ie. not visible until the close()), if we need to use that to allow
> us to abort commit operations.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]