[
https://issues.apache.org/jira/browse/HDFS-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076503#comment-14076503
]
Colin Patrick McCabe commented on HDFS-6764:
--------------------------------------------
It seems to make sense to skip a heartbeat interval if the NN took too long to
respond to the previous heartbeat. Adding a random factor (even only a few
milliseconds) is probably also a good idea. Seems like we should also have a
metric on the DN for how many heartbeats took longer than the expected
heartbeat interval... I took a look at the MXBean / MBean stuff but couldn't
find anything like that
> DN heartbeats may become clumped together
> -----------------------------------------
>
> Key: HDFS-6764
> URL: https://issues.apache.org/jira/browse/HDFS-6764
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Daryn Sharp
> Attachments: Screen Shot 2014-07-28 at 11.12.06 AM.png
>
>
> DNs send heartbeats on a fixed schedule based on the last time a heartbeat
> was sent. If the NN takes longer to respond than the heartbeat interval then
> DNs do not sleep until the next interval. Instead, another heartbeat is
> immediately sent and all DNs begin heartbeating on the same schedule.
--
This message was sent by Atlassian JIRA
(v6.2#6252)