[ https://issues.apache.org/jira/browse/HDFS-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076503#comment-14076503 ]
Colin Patrick McCabe commented on HDFS-6764: -------------------------------------------- It seems to make sense to skip a heartbeat interval if the NN took too long to respond to the previous heartbeat. Adding a random factor (even only a few milliseconds) is probably also a good idea. Seems like we should also have a metric on the DN for how many heartbeats took longer than the expected heartbeat interval... I took a look at the MXBean / MBean stuff but couldn't find anything like that > DN heartbeats may become clumped together > ----------------------------------------- > > Key: HDFS-6764 > URL: https://issues.apache.org/jira/browse/HDFS-6764 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Daryn Sharp > Attachments: Screen Shot 2014-07-28 at 11.12.06 AM.png > > > DNs send heartbeats on a fixed schedule based on the last time a heartbeat > was sent. If the NN takes longer to respond than the heartbeat interval then > DNs do not sleep until the next interval. Instead, another heartbeat is > immediately sent and all DNs begin heartbeating on the same schedule. -- This message was sent by Atlassian JIRA (v6.2#6252)