[ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289599#comment-17289599
 ] 

Jinglun commented on HDFS-15809:
--------------------------------

Hi [~leosun08], thanks your comments. Submit v04 using LinkedHashSet. The test 
case testDeadNodeDetectionDeadNodeProbe can cover the situation. It verifies 
the whole progress of the deadnodedetector. One node should be first put into 
suspect queue, then marked as dead and finally probed by the dead queue multi 
times. In the original implementation the 3 datanodes won't be all dead.

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --------------------------------------------------------------
>
>                 Key: HDFS-15809
>                 URL: https://issues.apache.org/jira/browse/HDFS-15809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch, 
> HDFS-15809.003.patch, HDFS-15809.004.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to