[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799554#action_12799554
 ] 

Hemanth Yamijala commented on MAPREDUCE-1316:
---------------------------------------------

I looked at the patch for the Yahoo! distribution. I have a few comments:

- Thinking a little more if we can address my concern about decoupling 
knowledge of task types and the removeJobTasks API,  can we refactor the code 
in removeJobTasks as follows:

{code}
JobInProgress {
  Map<TaskType, TaskInProgress[]> getAllTIPsByType() {
    Map<TaskType, TaskInProgress[]> tipsByType =
      new HashMap<TaskType, TaskInProgress[]>();
    tipsByType.put(TaskType.SETUP, getSetupTasks());
    ...
  }
}

JobTracker {

  removeJobTasks() {
    Map<TaskType, TaskInProgress[]> tipsByType = 
      job.getAllTIPsByType();
    for (TaskInProgress[] allTIPs : tipsByType.values()) {
      for (TaskInProgress tip : allTIPs) {
        for (TaskAttemptID id : tip.getAllTaskAttemptIDs()) {
          removeTaskEntry(id);
        }
      }
    }
  }
}
{code}

I feel this has the benefits of keeping knowledge of task types in the JIP, and 
also removing the duplication of code in removeJobTasks. Thoughts ?

- I think we can enhance the mock test in TestJobRetire to add all types of 
tasks so that all loops of removeJobTasks are covered.
- Further, it seems to me like the mock test is a regression test as well. As 
the status is not set for the task attempts you add, doesn't it mean that the 
old code would actually leave behind entries in the map reproducing the leak ? 
If yes, I suppose we can keep the end to end test using MiniMR temporarily 
until MAPREDUCE-1154 is addressed. But after that, I would assume such a test 
makes a lot more sense following approaches mentioned in MAPREDUCE-1154. We can 
inject altering behavior (like aborting the heartbeat, etc) in a more 
conventional manner. If you agree to this, can you please just add a comment to 
the end to end test case to the effect that this can be moved to a cluster test 
post MAPREDUCE-1154 ?

> JobTracker holds stale references to retired jobs via unreported tasks 
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1316
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>            Priority: Blocker
>         Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to