[jira] [Commented] (HDFS-15761) Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately

ASF GitHub Bot (Jira) Sat, 06 Dec 2025 16:29:39 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043291#comment-18043291
 ]


ASF GitHub Bot commented on HDFS-15761:
---------------------------------------

github-actions[bot] commented on PR #2588:
URL: https://github.com/apache/hadoop/pull/2588#issuecomment-3621410983

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately
> --------------------------------------------------------------
>
>                 Key: HDFS-15761
>                 URL: https://issues.apache.org/jira/browse/HDFS-15761
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ye Ni
>            Assignee: Ye Ni
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> To decommission a dead DN, the complete logic should be
>  Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED
> *Currently logic:*
> If a DN is already dead when DECOMMISSIONING starts, it becomes 
> DECOMMISSIONED immediately. DECOMMISSION_INPROGRESS is skipped.
> This logic is introduced by HDFS-7374 which is made because of HDFS-6791.
> HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes 
> dead during decommission, which could possibly make a dead DN in 
> DECOMMISSION_INPROGRESS forever, if the DN could never be alive.
> However, putting a dead DN to DECOMMISSIONED directly is not secure. For 
> example, 3 DN of the same block are dead at the same time, then the 
> administrator wants to decommission them unintentionally. Namenode should 
> check first before transit them to DECOMMISSIONED. Otherwise, it would be a 
> data loss.
> In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The 
> administrator needs to do some manual intervention, either repair the dead 
> machine or service or recover the data before take action on them.
> *This change is to add Dead, DECOMMISSION_INPROGRESS back.*
>  1. Dead normal DN is in DECOMMISSION_INPROGRESS first.
>  2. NN checks pendingReplicationBlocksCount and underReplicatedBlocksCount 
> are both 0.
>  3. Transit the dead DN to DECOMMISSIONED.
> 2 is implemented by HDFS-7409, which adds a check to allow dead nodes in 
> DECOMMISSION_IN_PROGRESS to progress to DECOMMISSIONED state if all files on 
> the filesystem are fully-replicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15761) Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately

Reply via email to