[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

Yiqun Lin (Jira) Tue, 19 Nov 2019 03:58:42 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977420#comment-16977420
 ]


Yiqun Lin commented on HDFS-14651:
----------------------------------

Thanks for addressing the comment, [~leosun08]. Some review comments:

{code}
private static long CONNECTION_TIMEOUT_MS = 20 * 1000; // 20ms

long DFS_CLIENT_DEAD_NODE_DETECTION_INTERVAL_DEFAULT = 60 * 1000; // 60ms
{code}
Here the comment should be 20s, 60s.

{code}
this.deadNodesProbeQueue = new ArrayBlockingQueue(maxDeadNodesProbeQueueLen);
{code}
Can we update this to new 
ArrayBlockingQueue<DatanodeInfo>(maxDeadNodesProbeQueueLen);

I think we can simplify this
{code}
+  private void checkDeadNodes() {
+    ...
+      for (DatanodeInfo datanodeInfo : datanodeInfos) {
+        LOG.debug("Add dead node to check: {}.", datanodeInfo);
+        if (isDeadNodesProbeQueueFull()) {
+          break;
+        }
+        addToDeadNodesProbeQueue(datanodeInfo);
+      ...
+  }
{code}
to
{code}
for (DatanodeInfo datanodeInfo : datanodeInfos) {
        LOG.debug("Add dead node to check: {}.", datanodeInfo);
        if(!addToDeadNodesProbeQueue(datanodeInfo)) {
                LOG.debug("Skip to add dead node {} to check since the proboe 
queue is full.", datanodeInfo);
                break;
        }
      }
{code}
And following three methods can be removed
*  addToDeadNodesProbeQueue
*  isDeadNodesProbeQueueFull
* pollFromDeadNodesProbeQueue

{code}
  private void removeFromDead(DatanodeInfo datanodeInfo) {
    deadNodes.remove(datanodeInfo.getDatanodeUuid());
  }
{code}
When we remove one dead node, can we also remove the dead node from 
dfsInputStreamNodes and DFSInputStream local dead nodes? DFSInputStream local 
dead will hold stale dead node but actually we check that this is not dead node 
now.

{code}
+  /**
+   * Start probe dead node thread.
+   */
+  private void startProbeScheduler() {
+    probeDeadNodesSchedulerThr =
+            new Thread(new ProbeScheduler(this, ProbeType.CHECK_DEAD));
+    probeDeadNodesSchedulerThr.setDaemon(true);
+    probeDeadNodesSchedulerThr.start();
+  }
{code}
We should have corresponding close operations of detector and stop these 
threads.

I noticed that here we only have one type CHECK_DEAD not other probe type like 
suspicious node type? Do we still need to define this type? Seems it's okay 
that we don't need to add suspicious state. 

> DeadNodeDetector checks dead node periodically
> ----------------------------------------------
>
>                 Key: HDFS-14651
>                 URL: https://issues.apache.org/jira/browse/HDFS-14651
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

Reply via email to