[
https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542530
]
Devaraj Das commented on HADOOP-1900:
-------------------------------------
Owen, by what you suggested, it appears that the heartbeat interval would be 2
seconds for all cluster configurations with more than 20 nodes. This seems way
too much.
After some thought, I am tending to agree with Owen that backoff may be
difficult to control. So here is a simplified proposal:
1) Monitor the average time we take to process an RPC
2) Assuming that every RPC can be processed within millisecond(s), the average
#RPCs that the server can process per minute (RPC-processed-per-minute) is:
(60000 / time-per-rpc). Assuming time-per-rpc is ~10 msec, ~6000 RPCs can be
processed in a minute. Since the heartbeat RPC invocation locks the JobTracker,
the number of handlers actually don't matter much.
3) The frequency of heartbeat should be (clustersize/RPC-processed-per-minute)
minutes.
For example, if ClusterSize = 1000, the heartbeat interval is set to 1000/6000
min = 10 sec.
4) taskCompletionEvents : this RPC is treated no differently than the heartbeat
RPC. In addition to regular polling, this RPC also happens on demand, i.e., a
TaskTracker invokes this RPC whenever a ReduceTask asks for MapcompletionEvents
and the TaskTracker has nothing to give back (a lower cap of 5 seconds is set
between two on-demand rpcs). This is similar to the way heartbeat RPCs work -
whenever tasks finish, the TaskTracker sends a heartbeat.
What do others think?
> the heartbeat and task event queries interval should be set dynamically by
> the JobTracker
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1900
> URL: https://issues.apache.org/jira/browse/HADOOP-1900
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sri Ramadasu
> Attachments: patch-1900.txt, patch-1900.txt
>
>
> The JobTracker should scale the intervals that the TaskTrackers use to
> contact it dynamically, based on how the busy it is and the size of the
> cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.