[ 
https://issues.apache.org/jira/browse/HDFS-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076503#comment-14076503
 ] 

Colin Patrick McCabe commented on HDFS-6764:
--------------------------------------------

It seems to make sense to skip a heartbeat interval if the NN took too long to 
respond to the previous heartbeat.  Adding a random factor (even only a few 
milliseconds) is probably also a good idea.  Seems like we should also have a 
metric on the DN for how many heartbeats took longer than the expected 
heartbeat interval... I took a look at the MXBean / MBean stuff but couldn't 
find anything like that

> DN heartbeats may become clumped together
> -----------------------------------------
>
>                 Key: HDFS-6764
>                 URL: https://issues.apache.org/jira/browse/HDFS-6764
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Daryn Sharp
>         Attachments: Screen Shot 2014-07-28 at 11.12.06 AM.png
>
>
> DNs send heartbeats on a fixed schedule based on the last time a heartbeat 
> was sent.  If the NN takes longer to respond than the heartbeat interval then 
> DNs do not sleep until the next interval.  Instead, another heartbeat is 
> immediately sent and all DNs begin heartbeating on the same schedule.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to