[
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297395#comment-17297395
]
Lisheng Sun edited comment on HDFS-15809 at 3/8/21, 2:15 PM:
-------------------------------------------------------------
Thank [~LiJinglun] for your patience work.
LGFM . +1 for [^HDFS-15809.007.patch] .
was (Author: leosun08):
Thank [~LiJinglun] for your patience work.
LGFM . +1 for
https://issues.apache.org/jira/secure/attachment/13021765/HDFS-15809.007.patch
> DeadNodeDetector doesn't remove live nodes from dead node set.
> --------------------------------------------------------------
>
> Key: HDFS-15809
> URL: https://issues.apache.org/jira/browse/HDFS-15809
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch,
> HDFS-15809.003.patch, HDFS-15809.004.patch, HDFS-15809.005.patch,
> HDFS-15809.006.patch, HDFS-15809.007.patch
>
>
> We found the dead node detector might never remove the alive nodes from the
> dead node set in a big cluster. For example:
> # 200 nodes are added to the dead node set by DeadNodeDetector.
> # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the
> deadNodesProbeQueue because the queue limited length is 100.
> # The probe threads start working and probe 30 nodes.
> # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead
> node set and adds 30 nodes to the deadNodesProbeQueue. But the order is the
> same as the last time. So the 30 nodes that has already been probed are added
> to the queue again.
> # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If
> they are all dead then the live nodes behind them could never be recovered.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]