[jira] [Commented] (HDFS-12866) Recursive delete of a large directory or snapshot makes namenode unresponsive

Daryn Sharp (JIRA) Mon, 04 Dec 2017 10:37:11 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-12866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277219#comment-16277219
 ]


Daryn Sharp commented on HDFS-12866:
------------------------------------

bq. Indeed I was thinking traversing to the root to check, like done in 
FSNamesystem#isFileDeleted, it cost some time, but we can find if an INode is 
disconnected, right?
I thought the parent was nulled for an inodeRef.WithName when deleted 
explicitly or implicitly as source of move.  The {{FSN#isFileDeleted}} 
implementation shows otherwise and is shockingly bad: looking up every ancestor 
child inode in its parent for an equality check.

bq. So the main issue of this approach is the cost of traversing to the root to 
check if any ancestor is disconnected? I wonder how bad it is.
Actually the main issue is what does a profile reveal?  Let's not make 
premature optimizations w/o solid analysis.

As for the traverse, making that a pervasive check throughout operations is 
penalizing the common case for what should be a relatively rare case (deletion 
of super-large directory).  Perhaps every 1-2y a massive directory is removed 
and stalls the NN for mins.  I want that danger removed but not at the expense 
of general performance.

bq. In IBR and FBR, can we assume the file exists if the INode is there?
It will be if only an ancestor is unlinked.   Don't have time to look, but I 
have concerns of what happens if a block slated for removal is updated and 
possibly added to other data structures (corrupt, excess, etc) or worse 
generates an edit which cannot be replayed.



> Recursive delete of a large directory or snapshot makes namenode unresponsive
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-12866
>                 URL: https://issues.apache.org/jira/browse/HDFS-12866
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Yongjun Zhang
>
> Currently file/directory deletion happens in two steps (see 
> {{FSNamesystem#delete(String src, boolean recursive, boolean logRetryCache)}}:
> # Do the following under fsn write lock and release the lock afterwards
> ** 1.1  recursively traverse the target, collect INodes and all blocks to be 
> deleted
> ** 1.2  delete all INodes
> # Delete the blocks to be deleted incrementally, chunk by chunk. That is, in 
> a loop, do:   
> ** acquire fsn write lock,
> ** delete chunk of blocks
> ** release fsn write lock
> Breaking the deletion to two steps is to not hold the fsn write lock for too 
> long thus making NN not responsive. However, even with this, for deleting 
> large directory, or deleting snapshot that has a lot of contents, step 1 
> itself would takes long time thus still hold the fsn write lock for too long 
> and make NN not responsive.
> A possible solution would be to add one more sub step in step 1, and only 
> hold fsn write lock in sub step 1.1:
> * 1.1. hold the fsn write lock, disconnect the target to be deleted from its 
> parent dir, release the lock
> * 1.2 recursively traverse the target, collect INodes and all blocks to be 
> deleted
> * 1.3  delete all INodes
> Then do step 2.
> This means, any operations on any file/dir need to check if its ancestor is 
> deleted (ancestor is disconnected), similar to what's done in 
> FSNamesystem#isFileDeleted method.
> I'm throwing the thought here for further discussion. Welcome comments and 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12866) Recursive delete of a large directory or snapshot makes namenode unresponsive

Reply via email to