[ 
https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596658#action_12596658
 ] 

Zheng Shao commented on HADOOP-3386:
------------------------------------

I am thinking about taking step 2 and 3 from 3370 proposed solution. (Note: in 
the 3370 patch I only did 1 to make sure the critical fix can get out as soon 
as possible).

Proposed solution from 3370:
1. On failed task, remove the task from runningJobs, but do not delete 
runningJobs job entry even if it's the only task of the job; (which means we 
should NOT call TaskTracker.removeTaskFromJob)

2. JobTracker should keep another data structure: jobsToTracker, for recording 
all the TaskTrackers that a job has started a task on.

3. When the job finished, JobTracker will send "KILL" job command to the 
TaskTrackers, based on jobsToTracker data structure.



> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the 
> problem.
> A tasktracker only cleans out the job dir when the job tracker sends a 
> "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action 
> to the task tracker. The case is when there is only failed tasks of this job 
> on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this 
> task tracker to any tasks of this job. As a result, the job tracker will not 
> send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to