[
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028269#comment-17028269
]
Lisheng Sun commented on HDFS-14651:
------------------------------------
Thanks [~ahussein] for your questions.
{quote}1.what is the usage of deadNodeDetectInterval ? As far as I understand,
every call to checkDeadNodes() will change the state to IDLE forcing the
DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, why do we need
deadNodeDetectInterval if the actual time gap between every check is
IDLE_SLEEP_MS?
{quote}
checkDeadNodes in checkDeadNodes() is not really necessary,since call idle()
after checkDeadNodes().
{quote} stopDeadNodeDetectorThread.stopDeadNodeDetectorThread() is supposed to
stop the deadNodeDetector thread; but it looks like the implementation of the
runnable never terminates. DeadNodeDetector surpresses all interrupts and never
checks for a termination flag. Therefore, the caller will just hang for 3
seconds waiting to join.
{quote}
{code:java}
/**
* Close dead node detector thread.
*/
public void stopDeadNodeDetectorThread() {
if (deadNodeDetectorThr != null) {
deadNodeDetectorThr.interrupt();
try {
deadNodeDetectorThr.join(3000);
} catch (InterruptedException e) {
LOG.warn("Encountered exception while waiting to join on dead " +
"node detector thread.", e);
}
}
}
{code}
i remove 3s timeout in deadNodeDetectorThr.join() for waiting for stoping the
deadNodeDetector thread.
i will create issue to fix these two problems. Thanks you.
> DeadNodeDetector checks dead node periodically
> ----------------------------------------------
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Lisheng Sun
> Assignee: Lisheng Sun
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch,
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch,
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode,
> If the access is successful, the Node will be moved from
> DeadNodeDetector#deadnode. Continuous detection of the dead node is
> necessary. The DataNode need rejoin the cluster due to a service
> restart/machine repair. The DataNode may be permanently excluded if there is
> no added probe mechanism.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]