[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

Lisheng Sun (Jira) Sat, 01 Feb 2020 20:16:13 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028269#comment-17028269
 ]


Lisheng Sun commented on HDFS-14651:
------------------------------------

Thanks [~ahussein] for your questions.
{quote}1.what is the usage of deadNodeDetectInterval ? As far as I understand, 
every call to checkDeadNodes() will change the state to IDLE forcing the 
DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, why do we need 
deadNodeDetectInterval if the actual time gap between every check is 
IDLE_SLEEP_MS?
{quote}
checkDeadNodes in checkDeadNodes() is not really necessary,since call idle() 
after checkDeadNodes().
{quote} stopDeadNodeDetectorThread.stopDeadNodeDetectorThread() is supposed to 
stop the deadNodeDetector thread; but it looks like the implementation of the 
runnable never terminates. DeadNodeDetector surpresses all interrupts and never 
checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join.
{quote}
{code:java}
/**
   * Close dead node detector thread.
   */
  public void stopDeadNodeDetectorThread() {
    if (deadNodeDetectorThr != null) {
      deadNodeDetectorThr.interrupt();
      try {
        deadNodeDetectorThr.join(3000);
      } catch (InterruptedException e) {
        LOG.warn("Encountered exception while waiting to join on dead " +
            "node detector thread.", e);
      }
    }
  }
{code}
i remove 3s timeout in deadNodeDetectorThr.join() for waiting for stoping the 
deadNodeDetector thread.

i  will create issue to fix these two problems. Thanks you.

> DeadNodeDetector checks dead node periodically
> ----------------------------------------------
>
>                 Key: HDFS-14651
>                 URL: https://issues.apache.org/jira/browse/HDFS-14651
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

Reply via email to