[
https://issues.apache.org/jira/browse/HDFS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745764#action_12745764
]
Raghu Angadi commented on HDFS-173:
-----------------------------------
The above looks better.. simpler to show correctness.
The key observation is that logging of each block deletion is the main culprit
(80 - 90% of the time spent). The above method moves block deletion to out side
the main deletion.
With the patch, when some one deletes directory tree with 10M it might still
lock NN on the order of a minute.
Another approach could be to emulate a clients recursive deletion : This would
not involve any semantic changes to NN internals.
It would be something like :
{code}
delete(dir) {
lastDeleted = "";
while ( lastDeleted != dir) {
subTree = findSubTree(dir, max_children = 1000); //under lock
delete(subTree) // under lock.
lastDeleted = subTree;
}
} {code}
This essentially breaks one deletion in to many deletions for large trees. It
also implies there would be many deletion entries in edit log, which is not an
issue.
findSubTree() would not iterate through entire tree each time. It stops
depth-first traversal at an inner node if number of files (or blocks) under it
is larger than given limit.
> Recursively deleting a directory with millions of files makes NameNode
> unresponsive for other commands until the deletion completes
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-173
> URL: https://issues.apache.org/jira/browse/HDFS-173
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: HDFS-173.patch
>
>
> Delete a directory with millions of files. This could take several minutes
> (observed 12 mins for 9 million files). While the operation is in progress
> FSNamesystem lock is held and the requests from clients are not handled until
> deletion completes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.