[ https://issues.apache.org/jira/browse/HDFS-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258452#comment-17258452 ]
Ye Ni edited comment on HDFS-15761 at 1/4/21, 7:45 PM: ------------------------------------------------------- cc [~mingma], [~andrew.wang], [~zhz] , [~inigoiri] was (Author: nickyye): cc [~mingma], [~andrew.wang], [~aiden_zhang], [~inigoiri] > Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately > -------------------------------------------------------------- > > Key: HDFS-15761 > URL: https://issues.apache.org/jira/browse/HDFS-15761 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ye Ni > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > To decommission a dead DN, the complete logic should be > Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED > *Currently logic:* > If a DN is already dead when DECOMMISSIONING starts, it becomes > DECOMMISSIONED immediately. DECOMMISSION_INPROGRESS is skipped. > This logic is introduced by https://issues.apache.org/jira/browse/HDFS-7374 > HDFS-7374 is made because of https://issues.apache.org/jira/browse/HDFS-6791. > HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes > dead during decommission, which could possibly make a dead DN in > DECOMMISSION_INPROGRESS forever, if the DN could never be alive. > However, putting a dead DN to DECOMMISSIONED directly is not secure. For > example, 3 DN of the same block are dead at the same time, then the > administrator puts them to DECOMMISSIONED. Namenode should check first before > transit them to DECOMMISSIONED. Otherwise, it would be a data loss. > In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The > administrator needs to do some manual intervention, either repair the dead > machine or service or recover the data before decommission them. > This change is to add Dead, DECOMMISSION_INPROGRESS back. > 1. Dead normal DN is in DECOMMISSION_INPROGRESS first. > 2. Then checked pendingReplicationBlocksCount and underReplicatedBlocksCount > are both 0 > 3. Transit the dead DN to DECOMMISSIONED. > 2 is implemented by https://issues.apache.org/jira/browse/HDFS-7409, which > adds a check to allow dead nodes in DECOMMISSION_IN_PROGRESS to progress to > DECOMMISSIONED state if all files on the filesystem are fully-replicated, > dead DN is in DECOMMISSION_INPROGRESS, then checked, before become > DECOMMISSIONED. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org