[ 
https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214340#comment-14214340
 ] 

Zhe Zhang commented on HDFS-7374:
---------------------------------

The failure is in {{testDecommissionWithNamenodeRestart}} and points to an 
interesting case: when the NN restarts, its DN {{isAlive}} bits are not turned 
on immediately. This will trigger our logic of immediately decommissioning 
already dead DNs.

We can move the logic back to {{refreshDatanodes}}. This way it only applies 
when the decomm command is triggered by the user/admin, instead of from 
{{registerDatanode}} -> {{checkDecommissioning}} -> {{startDecommission}}. 

Or we can update the logic if {{isAlive}} bit -- maybe it should be turned on 
when a DN registers itself.

> Allow decommissioning of dead DataNodes
> ---------------------------------------
>
>                 Key: HDFS-7374
>                 URL: https://issues.apache.org/jira/browse/HDFS-7374
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch
>
>
> We have seen the use case of decommissioning DataNodes that are already dead 
> or unresponsive, and not expected to rejoin the cluster.
> The logic introduced by HDFS-6791 will mark those nodes as 
> {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish 
> the decommission work. If an upper layer application is monitoring the 
> decommissioning progress, it will hang forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to