[ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15809:
------------------------------
          Component/s: datanode
         Hadoop Flags: Reviewed
     Target Version/s: 3.3.1, 3.4.0
    Affects Version/s: 3.3.1
                       3.4.0

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --------------------------------------------------------------
>
>                 Key: HDFS-15809
>                 URL: https://issues.apache.org/jira/browse/HDFS-15809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.3.1, 3.4.0
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>             Fix For: 3.3.1, 3.4.0
>
>         Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch, 
> HDFS-15809.003.patch, HDFS-15809.004.patch, HDFS-15809.005.patch, 
> HDFS-15809.006.patch, HDFS-15809.007.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to