[jira] [Comment Edited] (HADOOP-18568) Magic Committer optional clean up

Jira Wed, 14 Dec 2022 00:45:38 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646997#comment-17646997
 ]


André F. edited comment on HADOOP-18568 at 12/14/22 8:44 AM:
-------------------------------------------------------------

I made this experiment again yesterday night using 
`spark.hadoop.fs.s3a.bulk.delete.page.size=1000` and indeed I was able to see a 
difference in the clean-up phase:
{code:java}
2022-12-13T21:33:07.782Z pool-3-thread-10 INFO AbstractS3ACommitter: committing 
the output of 426881 task(s): duration 15:23.466s
2022-12-13T21:33:07.806Z pool-3-thread-10 INFO CommitOperations: Starting: 
Writing success file bucket/path_hash/_SUCCESS
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Aborting 
all pending commits under s3a://bucket/path_hash: duration 0:00.167s
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Cleanup 
job (no job ID): duration 0:00.167s
2022-12-13T21:33:08.212Z pool-3-thread-10 INFO MagicS3GuardCommitter: Starting: 
Deleting magic directory s3a://bucket/path_hash/__magic
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN 
MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects; 
failure count = 1
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN 
MultiObjectDeleteSupport: InternalError: 
path_hash/__magic/job-ab6af78e-2605-498c-955c-0f67d2e4673c/task_202212131907112569752852349249438_0116_m_316415.pendingset:
 We encountered an internal error. Please try again.
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO MagicS3GuardCommitter: Deleting 
magic directory s3a://bucket/path_hash/__magic: duration 9:58.814s
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO AbstractS3ACommitter: Task 
committer attempt_202212131907033242558538698078218_0000_m_000000_0: 
commitJob((no job ID)): duration 26:06.456s
 {code}
Still the overhead for the commit + clean-up is quite big in this case.


was (Author: JIRAUSER285813):
I made this experiment again yesterday night using 
`spark.hadoop.fs.s3a.bulk.delete.page.size=1000` and indeed I was able to see a 
difference in the clean-up phase:
{code:java}
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Aborting 
all pending commits under s3a://bucket/path_hash: duration 0:00.167s
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Cleanup 
job (no job ID): duration 0:00.167s
2022-12-13T21:33:08.212Z pool-3-thread-10 INFO MagicS3GuardCommitter: Starting: 
Deleting magic directory s3a://bucket/path_hash/__magic
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN 
MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects; 
failure count = 1
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN 
MultiObjectDeleteSupport: InternalError: 
path_hash/__magic/job-ab6af78e-2605-498c-955c-0f67d2e4673c/task_202212131907112569752852349249438_0116_m_316415.pendingset:
 We encountered an internal error. Please try again.
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO MagicS3GuardCommitter: Deleting 
magic directory s3a://bucket/path_hash/__magic: duration 9:58.814s
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO AbstractS3ACommitter: Task 
committer attempt_202212131907033242558538698078218_0000_m_000000_0: 
commitJob((no job ID)): duration 26:06.456s
 {code}
Still the overhead for the commit + clean-up is quite big in this case.

> Magic Committer optional clean up 
> ----------------------------------
>
>                 Key: HADOOP-18568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18568
>             Project: Hadoop Common
>          Issue Type: Wish
>          Components: fs/s3
>    Affects Versions: 3.3.3
>            Reporter: André F.
>            Priority: Minor
>
> It seems that deleting the `__magic` folder, depending on the number of 
> tasks/partitions used on a given spark job, can take really long time. I'm 
> having the following behavior on a given Spark job (processing ~30TB, with 
> ~420k tasks) using the magic committer:
> {code:java}
> 2022-12-10T21:25:19.629Z pool-3-thread-32 INFO MagicS3GuardCommitter: 
> Starting: Deleting magic directory s3a://my-bucket/random_hash/__magic
> 2022-12-10T21:52:03.250Z pool-3-thread-32 INFO MagicS3GuardCommitter: 
> Deleting magic directory s3a://my-bucket/random_hash/__magic: duration 
> 26:43.620s {code}
> I don't see a way out of it since the deletion of s3 objects needs to list 
> all objects under a prefix and this is what may be taking too much time. 
> Could we somehow make this cleanup optional? (the idea would be to delegate 
> it through s3 lifecycle policies in order to not create this overhead on the 
> commit phase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-18568) Magic Committer optional clean up

Reply via email to