[
https://issues.apache.org/jira/browse/HDFS-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077877#comment-14077877
]
Daryn Sharp commented on HDFS-6764:
-----------------------------------
Great minds think alike. Skipping missed intervals + entropy is exactly what
Nathan and I considered as the solution. We haven't had a chance to verify.
The entropy required is probably an random delay for the first heartbeat.
That'll spread out DNs that all connected and blocked while the NN is blocked
during something like lengthy BR processing.
The more interesting part of the puzzle, which the aforementioned changes will
probably mask/fix, is what causes heartbeats to an active to clump together in
a semi-rhythmic cycle? I suspect full BRs but I think I saw a similar jagged
pattern on another cluster... Will double check when I have time.
> DN heartbeats may become clumped together
> -----------------------------------------
>
> Key: HDFS-6764
> URL: https://issues.apache.org/jira/browse/HDFS-6764
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Daryn Sharp
> Attachments: Screen Shot 2014-07-28 at 11.12.06 AM.png
>
>
> DNs send heartbeats on a fixed schedule based on the last time a heartbeat
> was sent. If the NN takes longer to respond than the heartbeat interval then
> DNs do not sleep until the next interval. Instead, another heartbeat is
> immediately sent and all DNs begin heartbeating on the same schedule.
--
This message was sent by Atlassian JIRA
(v6.2#6252)