[jira] [Updated] (HDFS-15761) Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately

Jira Tue, 05 Jan 2021 09:41:34 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Íñigo Goiri updated HDFS-15761:
-------------------------------
    Status: Patch Available  (was: Open)

> Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately
> --------------------------------------------------------------
>
>                 Key: HDFS-15761
>                 URL: https://issues.apache.org/jira/browse/HDFS-15761
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ye Ni
>            Assignee: Ye Ni
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> To decommission a dead DN, the complete logic should be
>  Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED
> *Currently logic:*
> If a DN is already dead when DECOMMISSIONING starts, it becomes 
> DECOMMISSIONED immediately. DECOMMISSION_INPROGRESS is skipped.
> This logic is introduced by HDFS-7374 which is made because of HDFS-6791.
> HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes 
> dead during decommission, which could possibly make a dead DN in 
> DECOMMISSION_INPROGRESS forever, if the DN could never be alive.
> However, putting a dead DN to DECOMMISSIONED directly is not secure. For 
> example, 3 DN of the same block are dead at the same time, then the 
> administrator wants to decommission them unintentionally. Namenode should 
> check first before transit them to DECOMMISSIONED. Otherwise, it would be a 
> data loss.
> In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The 
> administrator needs to do some manual intervention, either repair the dead 
> machine or service or recover the data before take action on them.
> *This change is to add Dead, DECOMMISSION_INPROGRESS back.*
>  1. Dead normal DN is in DECOMMISSION_INPROGRESS first.
>  2. NN checks pendingReplicationBlocksCount and underReplicatedBlocksCount 
> are both 0.
>  3. Transit the dead DN to DECOMMISSIONED.
> 2 is implemented by HDFS-7409, which adds a check to allow dead nodes in 
> DECOMMISSION_IN_PROGRESS to progress to DECOMMISSIONED state if all files on 
> the filesystem are fully-replicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15761) Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately

Reply via email to