[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214340#comment-14214340 ]
Zhe Zhang commented on HDFS-7374: --------------------------------- The failure is in {{testDecommissionWithNamenodeRestart}} and points to an interesting case: when the NN restarts, its DN {{isAlive}} bits are not turned on immediately. This will trigger our logic of immediately decommissioning already dead DNs. We can move the logic back to {{refreshDatanodes}}. This way it only applies when the decomm command is triggered by the user/admin, instead of from {{registerDatanode}} -> {{checkDecommissioning}} -> {{startDecommission}}. Or we can update the logic if {{isAlive}} bit -- maybe it should be turned on when a DN registers itself. > Allow decommissioning of dead DataNodes > --------------------------------------- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-7374-001.patch, HDFS-7374-002.patch > > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)