shameersss1 opened a new pull request, #6006: URL: https://github.com/apache/hadoop/pull/6006
### Description of PR Currently concurrent writes are not supported by S3A Magic Committer. When the user tries to write to same parent , but to a different partition/sub-directory, The MPU metadata (.pendingset) of slower running jobs might be deleted by the the jobs which completes first. This happens because, The __magic directory is common across all the jobs and it gets cleanedup after the job completion which might affect the other jobs. ### Proposed Changes 1. Instead of a global magic directory __magic, Each job will have its own magic directory of the format __magic_<jobId> and all the .pendingset are written to that directory. 2. Introduced a new flag `fs.s3a.magic.cleanup.enabled` which is default to true, The clean up of magic directory will happen based on this flag. This was introduced to solve https://issues.apache.org/jira/browse/HADOOP-18568 3. The default value of `fs.s3a.committer.abort.pending.uploads` is set to false to support concurrent writes by default. ### How was this patch tested? 1. Ran S3A Unit Tests 2. Ran S3A Integration test in `us-west-1` region `[INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestPaths [INFO] Running org.apache.hadoop.fs.s3a.commit.TestMagicCommitPaths [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit [INFO] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.473 s - in org.apache.hadoop.fs.s3a.commit.TestMagicCommitPaths [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.611 s - in org.apache.hadoop.fs.s3a.commit.staging.TestPaths [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.965 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.474 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.627 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.249 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit [INFO] Tests run: 63, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 61.11 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.193 s - in org.apache.hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.901 s - in .hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure [INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol [INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocolFailure [INFO] Running org.apache.hadoop.fs.s3a.commit.integration.ITestS3ACommitterMRJob [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.225 s - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocolFailure [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestCommitOperations [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestCommitOperationCost [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.05 s - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.398 s - in org.apache.hadoop.fs.s3a.commit.ITestCommitOperationCost [INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.308 s - in org.apache.hadoop.fs.s3a.commit.ITestCommitOperations [INFO] Running org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations [INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 218.466 s - in org.apache.hadoop.fs.s3a.commit.integration.ITestS3ACommitterMRJob [WARNING] Tests run: 18, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 62.036 s - in org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations [WARNING] Tests run: 24, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 429.367 s - in .hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol [INFO] Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 472.06 s - in .hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 498.078 s - in .hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol [INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 861.607 s - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol [INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestS3AHugeMagicCommits [WARNING] Tests run: 10, Failures: 0, Errors: 0, Skipped: 10, Time elapsed: 29.141 s - in org.apache.hadoop.fs.s3a.commit.magic.ITestS3AHugeMagicCommits [INFO] Running org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortOnS3A [WARNING] Tests run: 14, Failures: 0, Errors: 0, Skipped: 14, Time elapsed: 47.485 s - in org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortOnS3A` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
