[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs

Emanuel Velzi (Jira) Wed, 12 Jul 2023 06:19:04 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742423#comment-17742423
 ]


Emanuel Velzi commented on HADOOP-18797:
----------------------------------------

The requirements are not exactly the same, but it's possible that the final 
implementation can cover both cases.

To clarify, my intention is to clean the job staging directory after 
committing. I don't want to have a separate job to handle that task. However, I 
need more control over what I'm deleting to ensure determinism.

If the implementation aligns with your second comment in HADOOP-18568, 
{+}_where the clean-up wouldn't be necessary_{+}, it could address the scenario 
I described.

Regarding the statement _"we should still allow for __magic cleanup so as to 
avoid leakage",_ I'm not entirely convinced. A single job should not 
necessarily be responsible for cleaning up the remnants of previous failed 
jobs. The same issue arises with staging committers. In the past, I had to 
create a job specifically designed to remove the ".staging-" directories that 
remained after some jobs failure.

Maybe the implementation could be a mix:
 * Delete task attempt dir as it went along, and
 * The new option _fs.s3a.cleanup.magic.enabled_ to allow or disallow a final 
cleanup to ensure that we don't have remnants of some failed jobs in the 
directory. (For concurrency it's gonna be disable).

 
Finally, {_}"running multiple jobs writing into the same dir is always pretty 
risky"{_}: I have heard this before, but I'm not entirely sure about the 
specific risks involved (beyond this case with magic committers). If you have 
any insights or examples to share, I would greatly appreciate it! 
 

> S3A committer fix lost data on concurrent jobs
> ----------------------------------------------
>
>                 Key: HADOOP-18797
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18797
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Emanuel Velzi
>            Priority: Major
>
> There is a failure in the commit process when multiple jobs are writing to a 
> s3 directory *concurrently* using {*}magic committers{*}.
> This issue is closely related HADOOP-17318.
> When multiple Spark jobs write to the same S3A directory, they upload files 
> simultaneously using "__magic" as the base directory for staging. Inside this 
> directory, there are multiple "/job-some-uuid" directories, each representing 
> a concurrently running job.
> To fix some preoblems related to concunrrency a property was introduced in 
> the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When 
> set to false, it ensures that during the cleanup stage, finalizing jobs do 
> not abort pending uploads from other jobs. So we see in logs this line: 
> {code:java}
> DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up 
> pending uploads to s3a ...{code}
> (from 
> [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952])
> However, in the next step, the {*}"__magic" directory is recursively 
> deleted{*}:
> {code:java}
> INFO  [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting 
> magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code}
> (from [AbstractS3ACommitter.java#L1112 
> |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and
>  
> [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)]
> This deletion operation *affects the second job* that is still running 
> because it loses pending uploads (i.e., ".pendingset" and ".pending" files).
> The consequences can range from an exception in the best case to a silent 
> loss of data in the worst case. The latter occurs when Job_1 deletes files 
> just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" 
> files in the job attempt directory previous to complete the uploads with POST 
> requests.
> To resolve this issue, it's important {*}to ensure that only the prefix 
> associated with the job currently finalizing is cleaned{*}.
> Here's a possible solution:
> {code:java}
> /**
>  * Delete the magic directory.
>  */
> public void cleanupStagingDirs() {
>   final Path out = getOutputPath();
>  //Path path = magicSubdir(getOutputPath());
>   Path path = new Path(magicSubdir(out), formatJobDir(getUUID()));
>   try(DurationInfo ignored = new DurationInfo(LOG, true,
>       "Deleting magic directory %s", path)) {
>     Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", 
> path.toString(),
>         () -> deleteWithWarning(getDestFS(), path, true));
>   }
> } {code}
>  
> The side effect of this issue is that the "__magic" directory is never 
> cleaned up. However, I believe this is a minor concern, even considering that 
> other folders such as "_SUCCESS" also persist after jobs end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs

Reply via email to