[ https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800120#action_12800120 ]
Amar Kamat commented on MAPREDUCE-1316: --------------------------------------- Arun, the logging changes will help in debugging memory leak issues caused because of stale references of TaskInProgress objects. The log changes are such that one log-line indicating task removal will be printed once per task. This is in sync with the task addition log-line and hence any mismatch in task adding and removal log-lines should point to a memory leak. This is not true today as the task removal log-line is printed in removeMarkedTasks() (caller of removeTaskEntry(), the api responsible for removing a task) which is not called for every task thats got added to the JobTracker. The log lines introduced are not in some loop and will be printed only once per task attempt. bq. The bug you point to is irrelevant in the current context i.e. JobInProgress.getTasks(TaskType) - '==' or equals is the right implementation. Looks like hadoop.io serializes enum as strings hence the jvm bug I pointed out doesnt hold here. ---- MAPREDUCE-1316 was raised because there was a mismatch between task-attempt addition and task-attempt removal in the JobTracker. The problem was that once the job retires, the job tasks are removed based on the statuses available. But task-status is added for a task-attempt only when the tasktracker returns back (once a task is assigned) with the next heartbeat. But there is a corner case in the removal logic. If the tasktracker is assigned a task and the job finishes, then the newly scheduled attempt will be added to the JobTracker but will not be removed as its status is not yet available. This patch changes the task-removal logic by iterating over all the scheduled/launched attempt-ids instead of statuses thus taking care of the corner case mentioned above. > JobTracker holds stale references to retired jobs via unreported tasks > ----------------------------------------------------------------------- > > Key: MAPREDUCE-1316 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Reporter: Amar Kamat > Assignee: Amar Kamat > Priority: Blocker > Attachments: mapreduce-1316-v1.11.patch, > mapreduce-1316-v1.13-branch20-yahoo.patch, > mapreduce-1316-v1.14-branch20-yahoo.patch, > mapreduce-1316-v1.14.1-branch20-yahoo.patch, > mapreduce-1316-v1.15-branch20-yahoo.patch, mapreduce-1316-v1.7.patch > > > JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if > the job finishes and retires. _Unreported tasks_ refers to tasks that were > scheduled but the tasktracker did not report back with the task status. In > such cases a stale reference is held to TaskInProgress (and thus > JobInProgress) long after the job is gone leading to memory leak. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.