[jira] [Commented] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Thu, 09 Mar 2017 10:15:03 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903528#comment-15903528
 ]


Steve Loughran commented on HADOOP-13786:
-----------------------------------------

Right now I'm doing this out of an ASF branch, rebasing onto the HADOOP-13345 
branch regularly. That way I can do things without adult supervision

https://github.com/steveloughran/hadoop/tree/s3guard/HADOOP-13786-committer

Test wise, I've been focusing on the integration tests 
{{org.apache.hadoop.fs.s3a.commit.staging.ITestStagingCommitProtocol}}, which 
is derived from 
{{org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter}} ; because 
they're from there they form part of the expectations of the protocol 
implementation for commitment
(more precisely, they're the closest we have to any definition). All these 
intiial tests work, except for those generating files in a subdirectory, e.g. 
{{part-0000/subfile}}; something critical for the intermediate output of an MR 
job.

The scanner for files to upload is just doing a flat list and then getting into 
trouble when it gets handed a directory to upload instead of a simple file.
{code}
java.io.FileNotFoundException: Not a 
file/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/tmp/mapred/local/job_200707121733_0001/_temporary/0/_temporary/attempt_200707121733_0001_m_000000_0/part-m-00000

        at 
org.apache.hadoop.fs.s3a.commit.staging.S3Util.multipartUpload(S3Util.java:104)
        at 
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter$8.run(StagingS3GuardCommitter.java:784)
        at 
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter$8.run(StagingS3GuardCommitter.java:771)
        at 
org.apache.hadoop.fs.s3a.commit.staging.Tasks$Builder.runSingleThreaded(Tasks.java:122)
        at 
org.apache.hadoop.fs.s3a.commit.staging.Tasks$Builder.run(Tasks.java:108)
        at 
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter.commitTaskInternal(StagingS3GuardCommitter.java:771)
        at 
org.apache.hadoop.fs.s3a.commit.staging.StagingS3GuardCommitter.commitTask(StagingS3GuardCommitter.java:716)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol.commit(AbstractITCommitProtocol.java:684)
        at 
org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol.testMapFileOutputCommitter(AbstractITCommitProtocol.java:567)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

What I need to do is go from the flat listFiles() to the recursive one, then 
use that to create the offset of the final destination. 

Regarding the mock tests, it's all happening because bits of the code now 
expect and S3aFS, and its only a base FileSystem, with a limited set of 
operations, being mocked

It's not going to be quite enough to mock S3AFS, unless those extra methods 
come in (which will surface as we try). 

FWIW, I'd actually prefer that, wherever possible, real integration tests were 
used over mock ones. Yes, they are slower, no, yetus and jenkins don't run 
them, but because they really do test the endpoint, and will catch regressions 
in the S3A client itself, quirks in different S3 implementations, etc. Given 
you've written them, it'll be good to get working. And the fault generation is 
great to test the resilience of the committer.




> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to