[
https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969572#action_12969572
]
Joydeep Sen Sarma commented on MAPREDUCE-2214:
----------------------------------------------
i think what happened in our case was something like this:
# task was requested to be killed
# the TT performed the kill action and reported back to the JT
# but the task reported back as done - at which point the TT promptly moved it
into the SUCCEEDED state
# meanwhile the JT scheduled a cleanup and the cleanup failed to launch without
returning the slot
the cris-crossing of #2 and #3 was what was unexpected i think (something the
code doesn't anticipate).
we don't hit this problem with speculation because we never request speculation
when the task is about to complete (there's a check on the remaining time on
the task and if the remaining time is less than N min - we don't speculate.
there's a jira for this - don't remember which).
> TaskTracker should release slot if task is not launched
> -------------------------------------------------------
>
> Key: MAPREDUCE-2214
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Ramkumar Vadali
> Assignee: Ramkumar Vadali
>
> TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not
> in an expected state. However, in the case where the task is not launched,
> the slot is not released. We have observed this in production - the task was
> in SUCCEEDED state by the time launchTask() got to it and then the slot was
> never released. It is not clear how the task got into that state, but it is
> better to handle the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.