[
https://issues.apache.org/jira/browse/MAPREDUCE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002757#comment-13002757
]
Dick King commented on MAPREDUCE-2355:
--------------------------------------
The reason we need this is that if many jobs have short tasks, the job tracker
can get beat up with too many heartbeats.
I think that the patch should have two pieces.
1: In any one node, we should delay an out-of-band heartbeat that we are
considering sending but that would otherwise occur too soon after the most
recent heartbeat, in the hopes of reporting multiple task attempt completions
in one heartbeat thus reducing the total load placed on the job tracker. This
involves compromises, because the node won't get a new task immediately.
2: We should cap the total number of heartbeats over a time interval. The cap
and the interval should be configurable. If that interval is INT and the cap
is C, we should track the times of the last C heartbeats we sent, and if the
time T of the oldest one is less than INT ago and we otherwise meet the
criteria for sending a heartbeat we should unconditionally send one at time T +
INT rather than immediately.
Since principle 2 may induce a longish delay, perhaps each heartbeat should say
when the next heartbeat should occur? This makes this patch a bigger deal
because up to now all changes could be localized to the TaskTracker but now
they can't, but it might be worthwhile.
> Add an out of band heartbeat damper
> -----------------------------------
>
> Key: MAPREDUCE-2355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2355
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Reporter: Owen O'Malley
> Assignee: Arun C Murthy
>
> We should have a configurable knob to throttle how many out of band
> heartbeats are sent.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira