[ http://issues.apache.org/jira/browse/HADOOP-639?page=comments#action_12446075 ] Owen O'Malley commented on HADOOP-639: --------------------------------------
Ok, I propose that we unify the heartbeats with the messages requesting work: In InterTrackerProtocol: TaskTrackerTask[] updateStatus(TaskTrackerStatus status) throws IOException; replacing emitHeartbeat, pollForNewTask, and pollForTaskWithClosedJob. The TaskTrackerTask includes: new task -- run a new task kill task -- stop a currently running task and clean up cleanup job -- clean up the map outputs from a given job The TaskTrackerStatus would gain: a counter of the number of TaskTrackerTask lists processed amount of free disk space in transient storage The counter would be used to make sure the last set of commands were received. Thus, it could replace both the init task timeouts and fix this bug. > task cleanup messages can get lost, causing task trackers to keep tasks > forever > ------------------------------------------------------------------------------- > > Key: HADOOP-639 > URL: http://issues.apache.org/jira/browse/HADOOP-639 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.7.2 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > Fix For: 0.8.0 > > > If the pollForTaskWithClosedJob call from a job tracker to a task tracker > times out when a job completes, the tasks are never cleaned up. This can > cause the mini m/r cluster to hang on shutdown, but also is a resource leak. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
