[
https://issues.apache.org/jira/browse/MAPREDUCE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213567#comment-13213567
]
Liyin Liang commented on MAPREDUCE-3892:
----------------------------------------
If a non-speculative and speculative tasks are launched for a map,
runningMapTasks will increase twice.
{code:}
synchronized void addRunningTaskToTIP(TaskInProgress tip, TaskAttemptID id,
TaskTrackerStatus tts,
boolean isScheduled) {
...
} else if (tip.isMapTask()) {
++runningMapTasks;
...
}
{code}
When the non-speculative task complete, runningMapTasks will decrease one.
{code:}
public synchronized boolean completedTask(TaskInProgress tip,
TaskStatus status)
{
...
} else if (tip.isMapTask()) {
runningMapTasks -= 1;
...
}
{code}
Then the speculative task is killed and marked as KILLED_UNCLEAN and the
task-cleanup task is launched.
When the task-cleanup task is complete, runningMapTasks will decrease one again.
{code:}
private void failedTask(TaskInProgress tip, TaskAttemptID taskid,
TaskStatus status,
TaskTracker taskTracker, boolean wasRunning,
boolean wasComplete, boolean wasAttemptRunning) {
...
if (tip.isMapTask() && !metricsDone) {
runningMapTasks -= 1;
...
}
{code}
However, if the tasktracker on which the task-cleanup task is running fails,
runningMapTasks will not decrease. This is because:
{code:}
void lostTaskTracker(TaskTracker taskTracker) {
...
if (!tip.isComplete() ||
(tip.isMapTask() && !tip.isJobSetupTask() &&
job.desiredReduces() != 0)) {
...
}
{code}
In this case, *tip.isComplete()* returns true and *job.desireReduces()* equals
zero(map-only job). This means that for this map task , we only decreased
runningMapTasks once , whereas we increased it twice. As a result,
runningMapTasks counter is incorrect.
> runningMapTasks counter is not decremented in case of failed task-cleanup
> tasks.
> --------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3892
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv1
> Affects Versions: 1.0.0
> Environment: Hadoop 1.0.0
> Reporter: Liyin Liang
>
> For a map-only job, a map task has two running attempts: attempt_0 and
> attempt_1 (speculative task). If attempt_0 completes first, then this map
> task is complete and attempt_1 is killed. Also a task-cleanup task will be
> launched to clean attempt_1's output. If the TaskTracker is lost while the
> task-cleanup task is running, the attempt_1 will remain KILLED_UNCLEAN
> status. Whats' more runningMapTasks equals one instead of zero after the job
> is finished.
> The incorrect runningMapTasks value can lead to bad scheduling decisions with
> our scheduler.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira