Ye Ni created HDFS-15761:
----------------------------

             Summary: Dead NORMAL DN shouldn't transit to DECOMMISSIONED 
immediately
                 Key: HDFS-15761
                 URL: https://issues.apache.org/jira/browse/HDFS-15761
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ye Ni


To decommission a dead DN, the complete logic should be
Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED

*Currently logic:*

If a DN is already dead when DECOMMISSIONING starts, it becomes DECOMMISSIONED 
immediately. DECOMMISSION_INPROGRESS is skipped.

This logic is introduced by https://issues.apache.org/jira/browse/HDFS-7374

HDFS-7374 is made because of https://issues.apache.org/jira/browse/HDFS-6791.

HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes 
dead during decommission, which could possibly make a dead DN in 
DECOMMISSION_INPROGRESS forever, if the DN could never be alive.

However, putting a dead DN to DECOMMISSIONED directly is not secure. For 
example, 3 DN of the same block are dead at the same time, then the 
administrator puts them to DECOMMISSIONED. Namenode should check first before 
transit them to DECOMMISSIONED. Otherwise, it would be a data loss.

In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The 
administrator needs to do some manual intervention, either repair the dead 
machine or service or recover the data before decommission them.

This change is to add Dead, DECOMMISSION_INPROGRESS back.
1. Dead normal DN is in DECOMMISSION_INPROGRESS first.
2. Then checked pendingReplicationBlocksCount and underReplicatedBlocksCount 
are both 0
3. Transit the dead DN to DECOMMISSIONED.

2 is implemented by https://issues.apache.org/jira/browse/HDFS-7409, which adds 
a check to allow dead nodes in DECOMMISSION_IN_PROGRESS to progress to 
DECOMMISSIONED state if all files on the filesystem are fully-replicated, dead 
DN is in DECOMMISSION_INPROGRESS, then checked, before become DECOMMISSIONED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to