[ 
https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760718#comment-17760718
 ] 

ASF GitHub Bot commented on HADOOP-18797:
-----------------------------------------

shameersss1 opened a new pull request, #6006:
URL: https://github.com/apache/hadoop/pull/6006

   ### Description of PR
   
   Currently concurrent writes are not supported by S3A Magic Committer. When 
the user tries to write to same parent , but to a different 
partition/sub-directory, The MPU metadata (.pendingset) of slower running jobs 
might be deleted by the the jobs which completes first. 
   
   This happens because, The __magic directory is common across all the jobs 
and it gets cleanedup after the job completion which might affect the other 
jobs.
   
   ### Proposed Changes
   
   1. Instead of a global magic directory __magic, Each job will have its own 
magic directory of the format __magic_<jobId> and all the .pendingset are 
written to that directory.
   
   2. Introduced a new flag `fs.s3a.magic.cleanup.enabled` which is default to 
true, The clean up of magic directory will happen based on this flag. This was 
introduced to solve https://issues.apache.org/jira/browse/HADOOP-18568
   
   3. The default value of `fs.s3a.committer.abort.pending.uploads` is set to 
false to support concurrent writes by default.
   
   
   ### How was this patch tested?
   
   1. Ran S3A Unit Tests
   
   2. Ran S3A Integration test in `us-west-1` region 
   
   `[INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
   [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestPaths
   [INFO] Running org.apache.hadoop.fs.s3a.commit.TestMagicCommitPaths
   [INFO] Running org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
   [INFO] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.473 s - in org.apache.hadoop.fs.s3a.commit.TestMagicCommitPaths
   [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.611 s - in org.apache.hadoop.fs.s3a.commit.staging.TestPaths
   [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.965 s - in 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingDirectoryOutputCommitter
   [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
23.474 s - in 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedFileListing
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.627 s - in 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedJobCommit
   [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
29.249 s - in 
org.apache.hadoop.fs.s3a.commit.staging.TestStagingPartitionedTaskCommit
   [INFO] Tests run: 63, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
61.11 s - in org.apache.hadoop.fs.s3a.commit.staging.TestStagingCommitter
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
84.193 s - in 
org.apache.hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.901 
s - in 
   .hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
   [INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocolFailure
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.integration.ITestS3ACommitterMRJob
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.225 
s - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocolFailure
   [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
   [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestCommitOperations
   [INFO] Running org.apache.hadoop.fs.s3a.commit.ITestCommitOperationCost
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.05 
s - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
   [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
43.398 s - in org.apache.hadoop.fs.s3a.commit.ITestCommitOperationCost
   [INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
143.308 s - in org.apache.hadoop.fs.s3a.commit.ITestCommitOperations
   [INFO] Running org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations
   [INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
218.466 s - in 
org.apache.hadoop.fs.s3a.commit.integration.ITestS3ACommitterMRJob
   [WARNING] Tests run: 18, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 
62.036 s - in org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations
   [WARNING] Tests run: 24, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
429.367 s - in 
   .hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
   [INFO] Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
472.06 s - in 
   .hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
   [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
498.078 s - in 
   .hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
   [INFO] Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
861.607 s - in org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol
   [INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestS3AHugeMagicCommits
   [WARNING] Tests run: 10, Failures: 0, Errors: 0, Skipped: 10, Time elapsed: 
29.141 s - in org.apache.hadoop.fs.s3a.commit.magic.ITestS3AHugeMagicCommits
   [INFO] Running org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortOnS3A
   [WARNING] Tests run: 14, Failures: 0, Errors: 0, Skipped: 14, Time elapsed: 
47.485 s - in org.apache.hadoop.fs.s3a.commit.terasort.ITestTerasortOnS3A`
   
   
   
   
   




> S3A committer fix lost data on concurrent jobs
> ----------------------------------------------
>
>                 Key: HADOOP-18797
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18797
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Emanuel Velzi
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>
> There is a failure in the commit process when multiple jobs are writing to a 
> s3 directory *concurrently* using {*}magic committers{*}.
> This issue is closely related HADOOP-17318.
> When multiple Spark jobs write to the same S3A directory, they upload files 
> simultaneously using "__magic" as the base directory for staging. Inside this 
> directory, there are multiple "/job-some-uuid" directories, each representing 
> a concurrently running job.
> To fix some preoblems related to concunrrency a property was introduced in 
> the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When 
> set to false, it ensures that during the cleanup stage, finalizing jobs do 
> not abort pending uploads from other jobs. So we see in logs this line: 
> {code:java}
> DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up 
> pending uploads to s3a ...{code}
> (from 
> [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952])
> However, in the next step, the {*}"__magic" directory is recursively 
> deleted{*}:
> {code:java}
> INFO  [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting 
> magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code}
> (from [AbstractS3ACommitter.java#L1112 
> |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and
>  
> [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)]
> This deletion operation *affects the second job* that is still running 
> because it loses pending uploads (i.e., ".pendingset" and ".pending" files).
> The consequences can range from an exception in the best case to a silent 
> loss of data in the worst case. The latter occurs when Job_1 deletes files 
> just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" 
> files in the job attempt directory previous to complete the uploads with POST 
> requests.
> To resolve this issue, it's important {*}to ensure that only the prefix 
> associated with the job currently finalizing is cleaned{*}.
> Here's a possible solution:
> {code:java}
> /**
>  * Delete the magic directory.
>  */
> public void cleanupStagingDirs() {
>   final Path out = getOutputPath();
>  //Path path = magicSubdir(getOutputPath());
>   Path path = new Path(magicSubdir(out), formatJobDir(getUUID()));
>   try(DurationInfo ignored = new DurationInfo(LOG, true,
>       "Deleting magic directory %s", path)) {
>     Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", 
> path.toString(),
>         () -> deleteWithWarning(getDestFS(), path, true));
>   }
> } {code}
>  
> The side effect of this issue is that the "__magic" directory is never 
> cleaned up. However, I believe this is a minor concern, even considering that 
> other folders such as "_SUCCESS" also persist after jobs end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to