[ 
https://issues.apache.org/jira/browse/HADOOP-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688271#action_12688271
 ] 

Amareshwari Sriramadasu commented on HADOOP-5548:
-------------------------------------------------

As Devaraj pointed out, problem is not with JobTracker restart.
In JobTracker, TaskTrackerStatus is cached in {{taskTrackers}} and is supposed 
to be read-only. But it is passed to updateTaskStatuses() method, in which task 
reports (TaskStatus objects) are passed to JobInProgress. In 
JobInProgress.updaTaskStatuses() and tip.updateStatus(), the TaskStatus object 
is getting modified.
The code in TaskInProgress modifying the TaskStatus reference :
{code}
    if (!isCleanupAttempt(taskid)) {
      taskStatuses.put(taskid, status);
    } else {
      taskStatuses.get(taskid).statusUpdate(status.getRunState(),
        status.getProgress(), status.getStateString(), status.getPhase(),
        status.getFinishTime());
    }
{code}

This could make total count negative in following scenario:
Tracker1 reported a task *t_0* is KILLED_UNCLEAN. 
Tracker2 is given the cleanup attempt for t_0.
Tracker2 reports saying it is running cleanup attempt t_0. Updates taskStatuses 
object,  which is holding TaskStatus object from tracker1's status.
JT calculates total count assuming the task is run by both the trackers, thus 
leading to negative totals.

Cloning TaskStatus object and passing to JIP looks like the correct solution. 
Thoughts?

> Observed negative running maps on the job tracker
> -------------------------------------------------
>
>                 Key: HADOOP-5548
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5548
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>
> We saw in both the web/ui and cli tools:
> {noformat}
> Cluster Summary (Heap Size is 11.7 GB/13.37 GB)
> Maps  Reduces Total       Nodes  Map Task  Reduce Task  Avg.     Blacklisted 
>               Submissions        Capacity   Capacity   Tasks/Node Nodes
> -971  0       133         434     1736        1736      8.00        0
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to