[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354892#comment-16354892
 ] 

Steve Loughran commented on HADOOP-15209:
-----------------------------------------

Update. +[~sanjay.radia]

An LRU cache of recently used directories should be sufficient. We may 
encounter the situation of calling delete() on a path which doesn't exist (its 
parent was deleted and evicted from the cache), but that should be rare. As the 
newly deleted dir will be added to the cache, the cache will recover from the 
situation for subsequent child entries of the newly deleted directory.

> PoC: DistCp to eliminate needless deletion of files under deleted directories
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-15209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15209
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15209-001.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to