[ https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799554#action_12799554 ]
Hemanth Yamijala commented on MAPREDUCE-1316: --------------------------------------------- I looked at the patch for the Yahoo! distribution. I have a few comments: - Thinking a little more if we can address my concern about decoupling knowledge of task types and the removeJobTasks API, can we refactor the code in removeJobTasks as follows: {code} JobInProgress { Map<TaskType, TaskInProgress[]> getAllTIPsByType() { Map<TaskType, TaskInProgress[]> tipsByType = new HashMap<TaskType, TaskInProgress[]>(); tipsByType.put(TaskType.SETUP, getSetupTasks()); ... } } JobTracker { removeJobTasks() { Map<TaskType, TaskInProgress[]> tipsByType = job.getAllTIPsByType(); for (TaskInProgress[] allTIPs : tipsByType.values()) { for (TaskInProgress tip : allTIPs) { for (TaskAttemptID id : tip.getAllTaskAttemptIDs()) { removeTaskEntry(id); } } } } } {code} I feel this has the benefits of keeping knowledge of task types in the JIP, and also removing the duplication of code in removeJobTasks. Thoughts ? - I think we can enhance the mock test in TestJobRetire to add all types of tasks so that all loops of removeJobTasks are covered. - Further, it seems to me like the mock test is a regression test as well. As the status is not set for the task attempts you add, doesn't it mean that the old code would actually leave behind entries in the map reproducing the leak ? If yes, I suppose we can keep the end to end test using MiniMR temporarily until MAPREDUCE-1154 is addressed. But after that, I would assume such a test makes a lot more sense following approaches mentioned in MAPREDUCE-1154. We can inject altering behavior (like aborting the heartbeat, etc) in a more conventional manner. If you agree to this, can you please just add a comment to the end to end test case to the effect that this can be moved to a cluster test post MAPREDUCE-1154 ? > JobTracker holds stale references to retired jobs via unreported tasks > ----------------------------------------------------------------------- > > Key: MAPREDUCE-1316 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Reporter: Amar Kamat > Assignee: Amar Kamat > Priority: Blocker > Attachments: mapreduce-1316-v1.11.patch, > mapreduce-1316-v1.13-branch20-yahoo.patch, mapreduce-1316-v1.7.patch > > > JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if > the job finishes and retires. _Unreported tasks_ refers to tasks that were > scheduled but the tasktracker did not report back with the task status. In > such cases a stale reference is held to TaskInProgress (and thus > JobInProgress) long after the job is gone leading to memory leak. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.