[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851781#comment-15851781
 ] 

Steve Loughran commented on HADOOP-13786:
-----------------------------------------

About to add a new patch, which does more in terms of testing, though there's 
some ambiguity about the semantics of commit and abort that I need to clarify 
with the MR Team.

h2. This code works without s3guard, but fails (differently) with s3guard local 
and s3guard dynamo

s3guard DDDB
{code}

Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 94.915 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)  Time 
elapsed: 9.292 sec  <<< FAILURE!
java.lang.AssertionError: Output directory not empty ls 
s3a://hwdev-steve-ireland-new/test/testAbort [00] 
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000; 
isDirectory=false; length=40; replication=1; blocksize=33554432; 
modification_time=1486142401246; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
: array lengths differed, expected.length=0 actual.length=1
        at org.junit.Assert.fail(Assert.java:88)
        at 
org.junit.internal.ComparisonCriteria.assertArraysAreSameLength(ComparisonCriteria.java:71)
        at 
org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:32)
        at org.junit.Assert.internalArrayEquals(Assert.java:473)
        at org.junit.Assert.assertArrayEquals(Assert.java:265)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:561)

testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)  Time 
elapsed: 9.453 sec  <<< FAILURE!
java.lang.AssertionError: expected output file: unexpectedly found 
s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as  
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000;
 isDirectory=false; length=40; replication=1; blocksize=33554432; 
modification_time=1486142441390; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
        at org.junit.Assert.fail(Assert.java:88)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertPathDoesNotExist(ContractTestUtils.java:796)
        at 
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathDoesNotExist(AbstractFSContractTestBase.java:305)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:587)


Results :

Failed tests: 
  
ITestS3AOutputCommitter.testAbort:561->Assert.assertArrayEquals:265->Assert.internalArrayEquals:473->Assert.fail:88
 Output directory not empty ls s3a://hwdev-steve-ireland-new/test/testAbort 
[00] 
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000; 
isDirectory=false; length=40; replication=1; blocksize=33554432; 
modification_time=1486142401246; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
: array lengths differed, expected.length=0 actual.length=1
  
ITestS3AOutputCommitter.testFailAbort:587->AbstractFSContractTestBase.assertPathDoesNotExist:305->Assert.fail:88
 expected output file: unexpectedly found 
s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as  
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000;
 isDirectory=false; length=40; replication=1; blocksize=33554432; 
modification_time=1486142441390; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false

Tests run: 9, Failures: 2, Errors: 0, Skipped: 0


{code}


s3guard local DB

{code}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
Tests run: 9, Failures: 2, Errors: 2, Skipped: 0, Time elapsed: 53.226 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter
testMapFileOutputCommitter(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)
  Time elapsed: 9.394 sec  <<< FAILURE!
java.lang.AssertionError: Number of MapFile.Reader entries in 
s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter : ls 
s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter [00] 
S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter/_SUCCESS;
 isDirectory=false; length=0; replication=1; blocksize=33554432; 
modification_time=1486142538000; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false
 expected:<1> but was:<0>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testMapFileOutputCommitter(ITestS3AOutputCommitter.java:500)

testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)  Time 
elapsed: 4.307 sec  <<< ERROR!
java.io.FileNotFoundException: No such file or directory: 
s3a://hwdev-steve-ireland-new/test/testAbort
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1480)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1456)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.listChildren(ContractTestUtils.java:427)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:560)

testCommitterWithFailure(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)
  Time elapsed: 5.375 sec  <<< FAILURE!
java.lang.AssertionError: Expected an exception
        at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:374)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.expectFNFEonJobCommit(ITestS3AOutputCommitter.java:396)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testCommitterWithFailure(ITestS3AOutputCommitter.java:390)

testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter)  Time 
elapsed: 5.016 sec  <<< ERROR!
java.io.FileNotFoundException: expected output dir: not found 
s3a://hwdev-steve-ireland-new/test/testFailAbort in 
s3a://hwdev-steve-ireland-new/test
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:125)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:773)
        at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:757)
        at 
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathExists(AbstractFSContractTestBase.java:294)
        at 
org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:586)

{code}

I don't know what's up here, but I do know that (a) we're doing too many 
deletes during the commit process, more specifically: creating too many mock 
empty dirs. That can be optimised with a "delete don't care about a parent dir" 
method in writeOperationHelper. 

In the first s3guard local test, {{testMapFileOutputCommitter}} all is well in 
the FS used by the job, but when a new FS instance is created in the same 
process, the second one isn't seeing the listing.

Not looked at the other failures in any detail.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to