[ 
https://issues.apache.org/jira/browse/HADOOP-17881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-17881:
------------------------------------
    Parent Issue: HADOOP-18067  (was: HADOOP-17409)

> S3A DeleteOperation to parallelize POSTing of bulk deletes
> ----------------------------------------------------------
>
>                 Key: HADOOP-17881
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17881
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> Once the need to update the DDB tables is removed, we can't go from a single 
> POSTed delete at a time to posting a large set of bulk delete operations in 
> parallel.
> The current design is to support incremental update of S3Guard tables, 
> including handling partial failures. Not a problem anymore.
> This will significantly improve delete() performance on directory trees with 
> many many children/descendants, as it goes from a sequence of children/1000 
> POSTs to parallel writes. As each file deleted is still throttled, we will be 
> limited to 3500 deletes/second with throttling, so throwing a large pool of 
> workers at the problem would be counter-productive and potentially cause 
> problems for other applications trying to write down the same directory tree. 
> But we can do better than one-POST at a time.
> Proposed
> * if parallel delete is off: no limit
> * parallel delete is on, limit #of parallel to 3000/page-size: you'll never 
> have more updates pending than the write limit of a single shard.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to