[
https://issues.apache.org/jira/browse/HADOOP-18420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612686#comment-17612686
]
Steve Loughran commented on HADOOP-18420:
-----------------------------------------
I think when deleting fake dirs we should also use an Invoke.once() over
retrying
* it's only cleanup
* retained markers aren't a problem for the recent clients
* maybe permissions problems are the cause of this
> Optimise S3A’s recursive delete to drop successful S3 keys on retry of S3
> DeleteObjects
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-18420
> URL: https://issues.apache.org/jira/browse/HADOOP-18420
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Daniel Carl Jones
> Priority: Major
>
> S3A users with large filesystems performing renames or deletes can run into
> throttling when S3A performs a bulk delete on keys. These are currently
> batches of 250
> ([https://github.com/apache/hadoop/blob/c1d82cd95e375410cb0dffc2931063d48687386f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L319-L323]).
> When the bulk delete ([S3
> DeleteObjects|https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html])
> fails, it provides a list of keys that failed and why. Today, S3A recovers
> from throttles by sending the DeleteObjects request again with no change.
> This can result in additional deletes and counts towards throttling limits.
> Instead, S3A should retry only the keys that failed, limiting the number of
> mutations against the S3 bucket, and hopefully mitigate errors when deleting
> a large number of objects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]