[ https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028269#comment-17028269 ]
Lisheng Sun commented on HDFS-14651: ------------------------------------ Thanks [~ahussein] for your questions. {quote}1.what is the usage of deadNodeDetectInterval ? As far as I understand, every call to checkDeadNodes() will change the state to IDLE forcing the DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, why do we need deadNodeDetectInterval if the actual time gap between every check is IDLE_SLEEP_MS? {quote} checkDeadNodes in checkDeadNodes() is not really necessary,since call idle() after checkDeadNodes(). {quote} stopDeadNodeDetectorThread.stopDeadNodeDetectorThread() is supposed to stop the deadNodeDetector thread; but it looks like the implementation of the runnable never terminates. DeadNodeDetector surpresses all interrupts and never checks for a termination flag. Therefore, the caller will just hang for 3 seconds waiting to join. {quote} {code:java} /** * Close dead node detector thread. */ public void stopDeadNodeDetectorThread() { if (deadNodeDetectorThr != null) { deadNodeDetectorThr.interrupt(); try { deadNodeDetectorThr.join(3000); } catch (InterruptedException e) { LOG.warn("Encountered exception while waiting to join on dead " + "node detector thread.", e); } } } {code} i remove 3s timeout in deadNodeDetectorThr.join() for waiting for stoping the deadNodeDetector thread. i will create issue to fix these two problems. Thanks you. > DeadNodeDetector checks dead node periodically > ---------------------------------------------- > > Key: HDFS-14651 > URL: https://issues.apache.org/jira/browse/HDFS-14651 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, > HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, > HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch > > > DeadNodeDetector checks dead node periodically. > DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, > If the access is successful, the Node will be moved from > DeadNodeDetector#deadnode. Continuous detection of the dead node is > necessary. The DataNode need rejoin the cluster due to a service > restart/machine repair. The DataNode may be permanently excluded if there is > no added probe mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org