[jira] [Commented] (HDFS-12866) Recursive delete of a large directory or snapshot makes namenode unresponsive

Daryn Sharp (JIRA) Wed, 29 Nov 2017 10:25:34 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-12866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271296#comment-16271296
 ]


Daryn Sharp commented on HDFS-12866:
------------------------------------

I've contemplated this exact approach for a long time and wish I did it eons 
ago because Kihwal is right, snapshots have made it much harder.

There's a high likelihood you will need the lock in step 2.1.  Inode based file 
operations.  Updating the snapshot diffs probably isn't thread-safe.   Perhaps 
batched locking like the block deletion might work.  Then you need to consider 
possible issues like incorrect quota computation if the live copy of the file 
is moved during the background delete of the snapshot.  Maybe I'm wrong, or 
maybe there are more tricky cases.

bq.  any operations on any file/dir need to check if its ancestor is deleted
Unless you have a means to avoid a traverse to root check for each block 
processed in an IBR/FBR, it's _not even an option_.

Stepping back, I've suggest actually profiling the NN.  Perhaps it's changed, 
but a major contributor to the slowness in the past was multiple subdir 
traversals to check permissions, compute quota, look for snapshots, etc that 
paled in comparison to the actual block collection.  Permission checks have 
been further degraded when acls are used.  It gets really bad when using an 
external inode attribute provider.

> Recursive delete of a large directory or snapshot makes namenode unresponsive
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-12866
>                 URL: https://issues.apache.org/jira/browse/HDFS-12866
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Yongjun Zhang
>
> Currently file/directory deletion happens in two steps (see 
> {{FSNamesystem#delete(String src, boolean recursive, boolean logRetryCache)}}:
> # Do the following under fsn write lock and release the lock afterwards
> ** 1.1  recursively traverse the target, collect INodes and all blocks to be 
> deleted
> ** 1.2  delete all INodes
> # Delete the blocks to be deleted incrementally, chunk by chunk. That is, in 
> a loop, do:   
> ** acquire fsn write lock,
> ** delete chunk of blocks
> ** release fsn write lock
> Breaking the deletion to two steps is to not hold the fsn write lock for too 
> long thus making NN not responsive. However, even with this, for deleting 
> large directory, or deleting snapshot that has a lot of contents, step 1 
> itself would takes long time thus still hold the fsn write lock for too long 
> and make NN not responsive.
> A possible solution would be to add one more sub step in step 1, and only 
> hold fsn write lock in sub step 1.1:
> * 1.1. hold the fsn write lock, disconnect the target to be deleted from its 
> parent dir, release the lock
> * 1.2 recursively traverse the target, collect INodes and all blocks to be 
> deleted
> * 1.3  delete all INodes
> Then do step 2.
> This means, any operations on any file/dir need to check if its ancestor is 
> deleted (ancestor is disconnected), similar to what's done in 
> FSNamesystem#isFileDeleted method.
> I'm throwing the thought here for further discussion. Welcome comments and 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12866) Recursive delete of a large directory or snapshot makes namenode unresponsive

Reply via email to