[jira] [Commented] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.

Jinglun (Jira) Thu, 18 Feb 2021 22:40:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286871#comment-17286871
 ]


Jinglun commented on HDFS-15806:
--------------------------------

Hi [~ayushtkn], thanks your comments ! 
{quote}before this was there some kind of memory leak, or these threads were 
getting cleared later?
{quote}
In Xiaomi we use the dead node detector feature only for hbase. The HBase 
doesn't close the files system and the dfs client. So we haven't notice the 
leak before.  Recently we found the dead node detector won't remove alive nodes 
from the dead node set, as described in HDFS-15809. So I started reviewing the 
whole feature and found this leak bug.
{quote}Secondly, for the shutdown is there some specific order, or it is just 
random
{quote}
It is random. Most of the threads are connected by queue(the producer-consumer 
model). So the order of  stopping the producer or the consumer won't be a 
problem.

1) The DeadNodeDetector thread is responsible for add nodes from 
_suspectAndDeadNodes_ set to _deadNodesProbeQueue_.

2) The _probeDeadNodesSchedulerThr_ is responsible for taking nodes from 
_deadNodesProbeQueue_ and __ submit probe tasks to _probeDeadNodesThreadPool_. 
3) The _probeSuspectNodesSchedulerThr_ is responsible for taking nodes from 
_suspectNodesProbeQueue_ and submit probe tasks to 
_probeSuspectNodesThreadPool_.

4) All the probe tasks submit getDatanodeInfo rpc calls in the thread pool 
_rpcThreadPool_.

 

Some other thoughts: the thread model is a little complicated and could be 
improved. For example I think we can do the rpc call at the probe task instead 
of submitting to rpcThreadPool. I need first figure out the purpose of the 
original design then may be start a new Jira for the thread improvement later.

> DeadNodeDetector should close all the threads when it is closed.
> ----------------------------------------------------------------
>
>                 Key: HDFS-15806
>                 URL: https://issues.apache.org/jira/browse/HDFS-15806
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15806.001.patch
>
>
> The DeadNodeDetector doesn't close all the threads when it is closed. This 
> Jira trys to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.

Reply via email to