Daniel Carl Jones created HADOOP-18420:
------------------------------------------
Summary: Optimise S3A’s recursive delete to drop successful S3
keys on retry of S3 DeleteObjects
Key: HADOOP-18420
URL: https://issues.apache.org/jira/browse/HADOOP-18420
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Reporter: Daniel Carl Jones
S3A users with large filesystems performing renames or deletes can run into
throttling when S3A performs a bulk delete on keys. These are currently batches
of 250
([https://github.com/apache/hadoop/blob/c1d82cd95e375410cb0dffc2931063d48687386f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L319-L323]).
When the bulk delete ([S3
DeleteObjects|https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html])
fails, it provides a list of keys that failed and why. Today, S3A recovers
from throttles by sending the DeleteObjects request again with no change. This
can result in additional deletes and counts towards throttling limits.
Instead, S3A should retry only the keys that failed, limiting the number of
mutations against the S3 bucket, and hopefully mitigate errors when deleting a
large number of objects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]