[ 
https://issues.apache.org/jira/browse/HDFS-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062771#comment-14062771
 ] 

Andrew Wang commented on HDFS-6688:
-----------------------------------

Hi Biju,

The 10.5 minute dead node timeout has been around for a while, it's different 
from the heartbeat. We want to wait a conservative amount of time before 
marking a node as dead, since that will start re-replication for all the blocks 
on that DN (very I/O and network intensive).

We do measure the "last heartbeat" time in places, and will mark a DN as 
"stale" if we haven't heard from it from a little while (e.g. 30s) but it's not 
yet dead. You could try looking at those metrics if you're interested in 
lower-latency detection methods.

If this is satisfactory, could we close this JIRA? Thanks Biju.

> Hadoop JMX stats are not refreshed
> ----------------------------------
>
>                 Key: HDFS-6688
>                 URL: https://issues.apache.org/jira/browse/HDFS-6688
>             Project: Hadoop HDFS
>          Issue Type: Bug
>         Environment: Ubuntu
>            Reporter: Biju Nair
>
> Even when the HDFS datanode process is stopped the JMX attribute 
> Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute 
> values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes 
> shows the stopped datanode details. If these attributes reflect the actual 
> changes in the datanode, they can be used to monitor the health of the HDFS 
> cluster which currently can't be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to