[ https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542530 ]
Devaraj Das commented on HADOOP-1900: ------------------------------------- Owen, by what you suggested, it appears that the heartbeat interval would be 2 seconds for all cluster configurations with more than 20 nodes. This seems way too much. After some thought, I am tending to agree with Owen that backoff may be difficult to control. So here is a simplified proposal: 1) Monitor the average time we take to process an RPC 2) Assuming that every RPC can be processed within millisecond(s), the average #RPCs that the server can process per minute (RPC-processed-per-minute) is: (60000 / time-per-rpc). Assuming time-per-rpc is ~10 msec, ~6000 RPCs can be processed in a minute. Since the heartbeat RPC invocation locks the JobTracker, the number of handlers actually don't matter much. 3) The frequency of heartbeat should be (clustersize/RPC-processed-per-minute) minutes. For example, if ClusterSize = 1000, the heartbeat interval is set to 1000/6000 min = 10 sec. 4) taskCompletionEvents : this RPC is treated no differently than the heartbeat RPC. In addition to regular polling, this RPC also happens on demand, i.e., a TaskTracker invokes this RPC whenever a ReduceTask asks for MapcompletionEvents and the TaskTracker has nothing to give back (a lower cap of 5 seconds is set between two on-demand rpcs). This is similar to the way heartbeat RPCs work - whenever tasks finish, the TaskTracker sends a heartbeat. What do others think? > the heartbeat and task event queries interval should be set dynamically by > the JobTracker > ----------------------------------------------------------------------------------------- > > Key: HADOOP-1900 > URL: https://issues.apache.org/jira/browse/HADOOP-1900 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Amareshwari Sri Ramadasu > Attachments: patch-1900.txt, patch-1900.txt > > > The JobTracker should scale the intervals that the TaskTrackers use to > contact it dynamically, based on how the busy it is and the size of the > cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.