[
https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596658#action_12596658
]
Zheng Shao commented on HADOOP-3386:
------------------------------------
I am thinking about taking step 2 and 3 from 3370 proposed solution. (Note: in
the 3370 patch I only did 1 to make sure the critical fix can get out as soon
as possible).
Proposed solution from 3370:
1. On failed task, remove the task from runningJobs, but do not delete
runningJobs job entry even if it's the only task of the job; (which means we
should NOT call TaskTracker.removeTaskFromJob)
2. JobTracker should keep another data structure: jobsToTracker, for recording
all the TaskTrackers that a job has started a task on.
3. When the job finished, JobTracker will send "KILL" job command to the
TaskTrackers, based on jobsToTracker data structure.
> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
> Key: HADOOP-3386
> URL: https://issues.apache.org/jira/browse/HADOOP-3386
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the
> problem.
> A tasktracker only cleans out the job dir when the job tracker sends a
> "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action
> to the task tracker. The case is when there is only failed tasks of this job
> on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this
> task tracker to any tasks of this job. As a result, the job tracker will not
> send a KILLJOB action to the task tracker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.