[
https://issues.apache.org/jira/browse/HDFS-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043291#comment-18043291
]
ASF GitHub Bot commented on HDFS-15761:
---------------------------------------
github-actions[bot] commented on PR #2588:
URL: https://github.com/apache/hadoop/pull/2588#issuecomment-3621410983
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately
> --------------------------------------------------------------
>
> Key: HDFS-15761
> URL: https://issues.apache.org/jira/browse/HDFS-15761
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ye Ni
> Assignee: Ye Ni
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> To decommission a dead DN, the complete logic should be
> Dead, NORMAL -> Dead, DECOMMISSION_INPROGRESS -> Dead, DECOMMISSIONED
> *Currently logic:*
> If a DN is already dead when DECOMMISSIONING starts, it becomes
> DECOMMISSIONED immediately. DECOMMISSION_INPROGRESS is skipped.
> This logic is introduced by HDFS-7374 which is made because of HDFS-6791.
> HDFS-6791 keeps the node in DECOMMISSION_INPROGRESS state if the node becomes
> dead during decommission, which could possibly make a dead DN in
> DECOMMISSION_INPROGRESS forever, if the DN could never be alive.
> However, putting a dead DN to DECOMMISSIONED directly is not secure. For
> example, 3 DN of the same block are dead at the same time, then the
> administrator wants to decommission them unintentionally. Namenode should
> check first before transit them to DECOMMISSIONED. Otherwise, it would be a
> data loss.
> In this case, all 3 DNs can't become DECOMMISSIONED which is by design. The
> administrator needs to do some manual intervention, either repair the dead
> machine or service or recover the data before take action on them.
> *This change is to add Dead, DECOMMISSION_INPROGRESS back.*
> 1. Dead normal DN is in DECOMMISSION_INPROGRESS first.
> 2. NN checks pendingReplicationBlocksCount and underReplicatedBlocksCount
> are both 0.
> 3. Transit the dead DN to DECOMMISSIONED.
> 2 is implemented by HDFS-7409, which adds a check to allow dead nodes in
> DECOMMISSION_IN_PROGRESS to progress to DECOMMISSIONED state if all files on
> the filesystem are fully-replicated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]