Steve Loughran created HADOOP-17881: ---------------------------------------
Summary: S3A DeleteOperation to parallelize POSTing of bulk deletes Key: HADOOP-17881 URL: https://issues.apache.org/jira/browse/HADOOP-17881 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Once the need to update the DDB tables is removed, we can't go from a single POSTed delete at a time to posting a large set of bulk delete operations in parallel. The current design is to support incremental update of S3Guard tables, including handling partial failures. Not a problem anymore. This will significantly improve delete() performance on directory trees with many many children/descendants, as it goes from a sequence of children/1000 POSTs to parallel writes. As each file deleted is still throttled, we will be limited to 3500 deletes/second with throttling, so throwing a large pool of workers at the problem would be counter-productive and potentially cause problems for other applications trying to write down the same directory tree. But we can do better than one-POST at a time. Proposed * if parallel delete is off: no limit * parallel delete is on, limit #of parallel to 3000/page-size: you'll never have more updates pending than the write limit of a single shard. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org