[ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851781#comment-15851781 ]
Steve Loughran commented on HADOOP-13786: ----------------------------------------- About to add a new patch, which does more in terms of testing, though there's some ambiguity about the semantics of commit and abort that I need to clarify with the MR Team. h2. This code works without s3guard, but fails (differently) with s3guard local and s3guard dynamo s3guard DDDB {code} Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter Tests run: 9, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 94.915 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 9.292 sec <<< FAILURE! java.lang.AssertionError: Output directory not empty ls s3a://hwdev-steve-ireland-new/test/testAbort [00] S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000; isDirectory=false; length=40; replication=1; blocksize=33554432; modification_time=1486142401246; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false : array lengths differed, expected.length=0 actual.length=1 at org.junit.Assert.fail(Assert.java:88) at org.junit.internal.ComparisonCriteria.assertArraysAreSameLength(ComparisonCriteria.java:71) at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:32) at org.junit.Assert.internalArrayEquals(Assert.java:473) at org.junit.Assert.assertArrayEquals(Assert.java:265) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:561) testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 9.453 sec <<< FAILURE! java.lang.AssertionError: expected output file: unexpectedly found s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000; isDirectory=false; length=40; replication=1; blocksize=33554432; modification_time=1486142441390; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.fs.contract.ContractTestUtils.assertPathDoesNotExist(ContractTestUtils.java:796) at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathDoesNotExist(AbstractFSContractTestBase.java:305) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:587) Results : Failed tests: ITestS3AOutputCommitter.testAbort:561->Assert.assertArrayEquals:265->Assert.internalArrayEquals:473->Assert.fail:88 Output directory not empty ls s3a://hwdev-steve-ireland-new/test/testAbort [00] S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testAbort/part-m-00000; isDirectory=false; length=40; replication=1; blocksize=33554432; modification_time=1486142401246; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false : array lengths differed, expected.length=0 actual.length=1 ITestS3AOutputCommitter.testFailAbort:587->AbstractFSContractTestBase.assertPathDoesNotExist:305->Assert.fail:88 expected output file: unexpectedly found s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000 as S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testFailAbort/part-m-00000; isDirectory=false; length=40; replication=1; blocksize=33554432; modification_time=1486142441390; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false Tests run: 9, Failures: 2, Errors: 0, Skipped: 0 {code} s3guard local DB {code} ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter Tests run: 9, Failures: 2, Errors: 2, Skipped: 0, Time elapsed: 53.226 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter testMapFileOutputCommitter(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 9.394 sec <<< FAILURE! java.lang.AssertionError: Number of MapFile.Reader entries in s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter : ls s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter [00] S3AFileStatus{path=s3a://hwdev-steve-ireland-new/test/testMapFileOutputCommitter/_SUCCESS; isDirectory=false; length=0; replication=1; blocksize=33554432; modification_time=1486142538000; access_time=0; owner=stevel; group=stevel; permission=rw-rw-rw-; isSymlink=false} isEmptyDirectory=false expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testMapFileOutputCommitter(ITestS3AOutputCommitter.java:500) testAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 4.307 sec <<< ERROR! java.io.FileNotFoundException: No such file or directory: s3a://hwdev-steve-ireland-new/test/testAbort at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1480) at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1456) at org.apache.hadoop.fs.contract.ContractTestUtils.listChildren(ContractTestUtils.java:427) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testAbort(ITestS3AOutputCommitter.java:560) testCommitterWithFailure(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 5.375 sec <<< FAILURE! java.lang.AssertionError: Expected an exception at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:374) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.expectFNFEonJobCommit(ITestS3AOutputCommitter.java:396) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testCommitterWithFailure(ITestS3AOutputCommitter.java:390) testFailAbort(org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter) Time elapsed: 5.016 sec <<< ERROR! java.io.FileNotFoundException: expected output dir: not found s3a://hwdev-steve-ireland-new/test/testFailAbort in s3a://hwdev-steve-ireland-new/test at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1765) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:125) at org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:773) at org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:757) at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathExists(AbstractFSContractTestBase.java:294) at org.apache.hadoop.fs.s3a.commit.ITestS3AOutputCommitter.testFailAbort(ITestS3AOutputCommitter.java:586) {code} I don't know what's up here, but I do know that (a) we're doing too many deletes during the commit process, more specifically: creating too many mock empty dirs. That can be optimised with a "delete don't care about a parent dir" method in writeOperationHelper. In the first s3guard local test, {{testMapFileOutputCommitter}} all is well in the FS used by the job, but when a new FS instance is created in the same process, the second one isn't seeing the listing. Not looked at the other failures in any detail. > Add S3Guard committer for zero-rename commits to consistent S3 endpoints > ------------------------------------------------------------------------ > > Key: HADOOP-13786 > URL: https://issues.apache.org/jira/browse/HADOOP-13786 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Affects Versions: HADOOP-13345 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-13786-HADOOP-13345-001.patch, > HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, > HADOOP-13786-HADOOP-13345-004.patch > > > A goal of this code is "support O(1) commits to S3 repositories in the > presence of failures". Implement it, including whatever is needed to > demonstrate the correctness of the algorithm. (that is, assuming that s3guard > provides a consistent view of the presence/absence of blobs, show that we can > commit directly). > I consider ourselves free to expose the blobstore-ness of the s3 output > streams (ie. not visible until the close()), if we need to use that to allow > us to abort commit operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org