[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200625#comment-14200625 ]
Zhe Zhang commented on HDFS-7374: --------------------------------- This use case has a (slightly) conflicting objective as that of HDFS-6791. I can think of 2 options to accommodate both scenarios: # Set a timeout (e.g., 10 minutes) limiting the time that a dead DN can stay in DECOMMISSION_INPROGRESS state. # If a DN is already dead when decomm starts, indicating that a user is intentionally decommissioning a dead node, we should allow it to enter decommission complete state. [~mingma] and [~jingzhao] please advise if they look reasonable to you, and whether you prefer one over the other. Or any other approaches. Thanks! > Allow decommissioning of dead DataNodes > --------------------------------------- > > Key: HDFS-7374 > URL: https://issues.apache.org/jira/browse/HDFS-7374 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Zhe Zhang > Assignee: Zhe Zhang > > We have seen the use case of decommissioning DataNodes that are already dead > or unresponsive, and not expected to rejoin the cluster. > The logic introduced by HDFS-6791 will mark those nodes as > {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish > the decommission work. If an upper layer application is monitoring the > decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)