[ https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539067 ]
Amareshwari Sri Ramadasu commented on HADOOP-1900: -------------------------------------------------- In the patch attached, The job tracker periodically calculates the heartbeat interval. It looks at both cluster size and busyness of jobtracker. If jobtracker is busy, the interval is incremented by a busyFactor. If it is not busy for two continuous periods, the interval is decremented by the busyFactor. Map events polling interval is calculated as a function of heartbeat interval to skip the recalculation. It is calculated as follows: polling_interval = heartbeat_interval/3; if polling_interval < MIN_POLLING_INTERVAL, then polling_interval = MIN_POLLING_INTERVAL; if polling_interval > MAX_POLLING_INTERVAL, then polling_interval = MAX_POLLING_INTERVAL; MapEventsFetcherThread is notified if a reduce task doesnt find map events at the tasktracker. bq.I propose a change to the status message in the heartbeat - the tasktracker can compare the current task status with the previous one and if it finds the status to be the same, it doesn't send the complete status object to the JobTracker, but just a flag saying it is a duplicate or something to that effect. That will reduce the data per RPC considerably for long running tasks whose statuses don't change frequently and also reduce the processing load on the JobTracker. This will be addressed in another JIRA > the heartbeat and task event queries interval should be set dynamically by > the JobTracker > ----------------------------------------------------------------------------------------- > > Key: HADOOP-1900 > URL: https://issues.apache.org/jira/browse/HADOOP-1900 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Amareshwari Sri Ramadasu > Attachments: patch-1900.txt > > > The JobTracker should scale the intervals that the TaskTrackers use to > contact it dynamically, based on how the busy it is and the size of the > cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.