[ 
https://issues.apache.org/jira/browse/HDFS-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958001#comment-13958001
 ] 

Jing Zhao commented on HDFS-6186:
---------------------------------

When the NN is processing the first block report, looks like it currently 
bypass the block that does not belong to any file:
{code}
      // If block does not belong to any file, we are done.
      if (storedBlock == null) continue;
{code}
It will add the block to the invalidate list for the following block report 
(which will be one hour later). So I think what we can first do is to show the 
# of blocks that NN cannot recognize in the first block report to the WebUI?

In the meanwhile, in case that we just restart NN while DNs are still running 
(e.g., we restart SBN while ANN is still running and then we trigger the 
failover), currently NN may process an IBR before a full block report. Then the 
first full block report sent to NN after its restarting can trigger the block 
deletion immediately.

> Pause deletion of blocks when the namenode starts up
> ----------------------------------------------------
>
>                 Key: HDFS-6186
>                 URL: https://issues.apache.org/jira/browse/HDFS-6186
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Suresh Srinivas
>
> HDFS namenode can delete blocks very quickly, given the deletion happens as a 
> parallel operation spread across many datanodes. One of the frequent 
> anxieties I see is that a lot of data can be deleted very quickly, when a 
> cluster is brought up, especially when one of the storage directories has 
> failed and namenode metadata was copied from another storage. Copying wrong 
> metadata would results in some of the newer files (if old metadata was 
> copied) being deleted along with their blocks. 
> HDFS-5986 now captures the number of pending deletion block on namenode webUI 
> and JMX. I propose pausing deletion of blocks for a configured period of time 
> (default 1 hour?) after namenode comes out of safemode. This will give enough 
> time for the administrator to notice large number of pending deletion blocks 
> and take corrective action.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to