[ http://issues.apache.org/jira/browse/HADOOP-362?page=all ]
Owen O'Malley reassigned HADOOP-362:
------------------------------------
Assignee: Owen O'Malley (was: Devaraj Das)
> tasks can get lost when reporting task completion to the JobTracker has an
> error
> --------------------------------------------------------------------------------
>
> Key: HADOOP-362
> URL: http://issues.apache.org/jira/browse/HADOOP-362
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Devaraj Das
> Assigned To: Owen O'Malley
> Attachments: lost-status-updates.patch
>
>
> Basically, the JobTracker used to lose some updates about successful map
> tasks and it would assume that the tasks are still running (the old progress
> report is what it used to display in the web page). Now this would cause the
> reduces to also wait for the map output and they would never receive the
> output. This would cause the job to appear as if it was hung.
>
> The following piece of code sends the status of tasks to the JobTracker:
>
> synchronized (this) {
> for (Iterator it = runningTasks.values().iterator();
> it.hasNext(); ) {
> TaskInProgress tip = (TaskInProgress) it.next();
> TaskStatus status = tip.createStatus();
> taskReports.add(status);
> if (status.getRunState() != TaskStatus.RUNNING) {
> if (tip.getTask().isMapTask()) {
> mapTotal--;
> } else {
> reduceTotal--;
> }
> it.remove();
> }
> }
> }
>
> //
> // Xmit the heartbeat
> //
>
> TaskTrackerStatus status =
> new TaskTrackerStatus(taskTrackerName, localHostname,
> httpPort, taskReports,
> failures);
> int resultCode = jobClient.emitHeartbeat(status, justStarted);
>
>
> Notice that the completed TIPs are removed from runningTasks data structure.
> Now, if the emitHeartBeat threw an exception (if it could not communicate
> with the JobTracker till the IPC timeout expires) then this update is lost.
> And the next time it sends the hearbeat this completed task's status is
> missing and hence the JobTracker doesn't know about this completed task. So,
> one solution to this is to remove the completed TIPs from runningTasks after
> emitHeartbeat returns. Here is how the new code would look like:
>
>
> synchronized (this) {
> for (Iterator it = runningTasks.values().iterator();
> it.hasNext(); ) {
> TaskInProgress tip = (TaskInProgress) it.next();
> TaskStatus status = tip.createStatus();
> taskReports.add(status);
> }
> }
>
> //
> // Xmit the heartbeat
> //
>
> TaskTrackerStatus status =
> new TaskTrackerStatus(taskTrackerName, localHostname,
> httpPort, taskReports,
> failures);
> int resultCode = jobClient.emitHeartbeat(status, justStarted);
> synchronized (this) {
> for (Iterator it = runningTasks.values().iterator();
> it.hasNext(); ) {
> TaskInProgress tip = (TaskInProgress) it.next();
> if (tip.runstate != TaskStatus.RUNNING) {
> if (tip.getTask().isMapTask()) {
> mapTotal--;
> } else {
> reduceTotal--;
> }
> it.remove();
> }
> }
> }
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira