[ 
https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542530
 ] 

Devaraj Das commented on HADOOP-1900:
-------------------------------------

Owen, by what you suggested, it appears that the heartbeat interval would be 2 
seconds for all cluster configurations with more than 20 nodes. This seems way 
too much.

After some thought, I am tending to agree with Owen that backoff may be 
difficult to control. So here is a simplified proposal:

1) Monitor the average time we take to process an RPC

2) Assuming that every RPC can be processed within millisecond(s), the average 
#RPCs that the server can process per minute (RPC-processed-per-minute) is:  
(60000 / time-per-rpc). Assuming time-per-rpc is ~10 msec, ~6000 RPCs can be 
processed in a minute. Since the heartbeat RPC invocation locks the JobTracker, 
the number of handlers actually don't matter much.

3) The frequency of heartbeat should be (clustersize/RPC-processed-per-minute) 
minutes.
For example, if ClusterSize = 1000, the heartbeat interval is set to 1000/6000 
min = 10 sec.

4) taskCompletionEvents : this RPC is treated no differently than the heartbeat 
RPC. In addition to regular polling, this RPC also happens on demand, i.e., a 
TaskTracker invokes this RPC whenever a ReduceTask asks for MapcompletionEvents 
and the TaskTracker has nothing to give back (a lower cap of 5 seconds is set 
between two on-demand rpcs). This is similar to the way heartbeat RPCs work - 
whenever tasks finish, the TaskTracker sends a heartbeat.

What do others think?

> the heartbeat and task event queries interval should be set dynamically by 
> the JobTracker
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1900
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sri Ramadasu
>         Attachments: patch-1900.txt, patch-1900.txt
>
>
> The JobTracker should scale the intervals that the TaskTrackers use to 
> contact it dynamically, based on how the busy it is and the size of the 
> cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to