[
https://issues.apache.org/jira/browse/HADOOP-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646997#comment-17646997
]
André F. edited comment on HADOOP-18568 at 12/14/22 8:44 AM:
-------------------------------------------------------------
I made this experiment again yesterday night using
`spark.hadoop.fs.s3a.bulk.delete.page.size=1000` and indeed I was able to see a
difference in the clean-up phase:
{code:java}
2022-12-13T21:33:07.782Z pool-3-thread-10 INFO AbstractS3ACommitter: committing
the output of 426881 task(s): duration 15:23.466s
2022-12-13T21:33:07.806Z pool-3-thread-10 INFO CommitOperations: Starting:
Writing success file bucket/path_hash/_SUCCESS
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Aborting
all pending commits under s3a://bucket/path_hash: duration 0:00.167s
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Cleanup
job (no job ID): duration 0:00.167s
2022-12-13T21:33:08.212Z pool-3-thread-10 INFO MagicS3GuardCommitter: Starting:
Deleting magic directory s3a://bucket/path_hash/__magic
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN
MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects;
failure count = 1
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN
MultiObjectDeleteSupport: InternalError:
path_hash/__magic/job-ab6af78e-2605-498c-955c-0f67d2e4673c/task_202212131907112569752852349249438_0116_m_316415.pendingset:
We encountered an internal error. Please try again.
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO MagicS3GuardCommitter: Deleting
magic directory s3a://bucket/path_hash/__magic: duration 9:58.814s
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO AbstractS3ACommitter: Task
committer attempt_202212131907033242558538698078218_0000_m_000000_0:
commitJob((no job ID)): duration 26:06.456s
{code}
Still the overhead for the commit + clean-up is quite big in this case.
was (Author: JIRAUSER285813):
I made this experiment again yesterday night using
`spark.hadoop.fs.s3a.bulk.delete.page.size=1000` and indeed I was able to see a
difference in the clean-up phase:
{code:java}
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Aborting
all pending commits under s3a://bucket/path_hash: duration 0:00.167s
2022-12-13T21:33:08.189Z pool-3-thread-10 INFO AbstractS3ACommitter: Cleanup
job (no job ID): duration 0:00.167s
2022-12-13T21:33:08.212Z pool-3-thread-10 INFO MagicS3GuardCommitter: Starting:
Deleting magic directory s3a://bucket/path_hash/__magic
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN
MultiObjectDeleteSupport: Bulk delete operation failed to delete all objects;
failure count = 1
2022-12-13T21:34:44.390Z s3a-transfer-bucket-bounded-pool6-t82 WARN
MultiObjectDeleteSupport: InternalError:
path_hash/__magic/job-ab6af78e-2605-498c-955c-0f67d2e4673c/task_202212131907112569752852349249438_0116_m_316415.pendingset:
We encountered an internal error. Please try again.
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO MagicS3GuardCommitter: Deleting
magic directory s3a://bucket/path_hash/__magic: duration 9:58.814s
2022-12-13T21:43:07.026Z pool-3-thread-10 INFO AbstractS3ACommitter: Task
committer attempt_202212131907033242558538698078218_0000_m_000000_0:
commitJob((no job ID)): duration 26:06.456s
{code}
Still the overhead for the commit + clean-up is quite big in this case.
> Magic Committer optional clean up
> ----------------------------------
>
> Key: HADOOP-18568
> URL: https://issues.apache.org/jira/browse/HADOOP-18568
> Project: Hadoop Common
> Issue Type: Wish
> Components: fs/s3
> Affects Versions: 3.3.3
> Reporter: André F.
> Priority: Minor
>
> It seems that deleting the `__magic` folder, depending on the number of
> tasks/partitions used on a given spark job, can take really long time. I'm
> having the following behavior on a given Spark job (processing ~30TB, with
> ~420k tasks) using the magic committer:
> {code:java}
> 2022-12-10T21:25:19.629Z pool-3-thread-32 INFO MagicS3GuardCommitter:
> Starting: Deleting magic directory s3a://my-bucket/random_hash/__magic
> 2022-12-10T21:52:03.250Z pool-3-thread-32 INFO MagicS3GuardCommitter:
> Deleting magic directory s3a://my-bucket/random_hash/__magic: duration
> 26:43.620s {code}
> I don't see a way out of it since the deletion of s3 objects needs to list
> all objects under a prefix and this is what may be taking too much time.
> Could we somehow make this cleanup optional? (the idea would be to delegate
> it through s3 lifecycle policies in order to not create this overhead on the
> commit phase).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]