[
https://issues.apache.org/jira/browse/HDFS-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062771#comment-14062771
]
Andrew Wang commented on HDFS-6688:
-----------------------------------
Hi Biju,
The 10.5 minute dead node timeout has been around for a while, it's different
from the heartbeat. We want to wait a conservative amount of time before
marking a node as dead, since that will start re-replication for all the blocks
on that DN (very I/O and network intensive).
We do measure the "last heartbeat" time in places, and will mark a DN as
"stale" if we haven't heard from it from a little while (e.g. 30s) but it's not
yet dead. You could try looking at those metrics if you're interested in
lower-latency detection methods.
If this is satisfactory, could we close this JIRA? Thanks Biju.
> Hadoop JMX stats are not refreshed
> ----------------------------------
>
> Key: HDFS-6688
> URL: https://issues.apache.org/jira/browse/HDFS-6688
> Project: Hadoop HDFS
> Issue Type: Bug
> Environment: Ubuntu
> Reporter: Biju Nair
>
> Even when the HDFS datanode process is stopped the JMX attribute
> Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute
> values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes
> shows the stopped datanode details. If these attributes reflect the actual
> changes in the datanode, they can be used to monitor the health of the HDFS
> cluster which currently can't be used.
--
This message was sent by Atlassian JIRA
(v6.2#6252)