[ https://issues.apache.org/jira/browse/HDFS-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Íñigo Goiri updated HDFS-15761: ------------------------------- Status: Patch Available (was: Open) > Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately > -------------------------------------------------------------- > > Key: HDFS-15761 > URL: https://issues.apache.org/jira/browse/HDFS-15761 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ye Ni > Assignee: Ye Ni > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > To decommission a dead DN, the complete logic should be > Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED > *Currently logic:* > If a DN is already dead when DECOMMISSIONING starts, it becomes > DECOMMISSIONED immediately. DECOMMISSION_INPROGRESS is skipped. > This logic is introduced by HDFS-7374 which is made because of HDFS-6791. > HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes > dead during decommission, which could possibly make a dead DN in > DECOMMISSION_INPROGRESS forever, if the DN could never be alive. > However, putting a dead DN to DECOMMISSIONED directly is not secure. For > example, 3 DN of the same block are dead at the same time, then the > administrator wants to decommission them unintentionally. Namenode should > check first before transit them to DECOMMISSIONED. Otherwise, it would be a > data loss. > In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The > administrator needs to do some manual intervention, either repair the dead > machine or service or recover the data before take action on them. > *This change is to add Dead, DECOMMISSION_INPROGRESS back.* > 1. Dead normal DN is in DECOMMISSION_INPROGRESS first. > 2. NN checks pendingReplicationBlocksCount and underReplicatedBlocksCount > are both 0. > 3. Transit the dead DN to DECOMMISSIONED. > 2 is implemented by HDFS-7409, which adds a check to allow dead nodes in > DECOMMISSION_IN_PROGRESS to progress to DECOMMISSIONED state if all files on > the filesystem are fully-replicated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org