[ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568437#action_12568437
 ] 

Devaraj Das commented on HADOOP-2119:
-------------------------------------

BTW, if we take the sparse matrix approach, we really don't need the other 
datastructure for RUNNING tasks. 
In the sparse matrix proposal, note that all TIPs are running in a location if 
the first TIP is running since we always move TIP columns to the back whenever 
we choose a TIP for running. And, we don't consider tasks to execute 
speculatively unless we run out of virgin tasks. So when we run into the 
situation where we want to consider tasks for speculatve execution, we go in 
the order - local, rack local, off rack. We hit all all the locations in O(1) 
and the time to find a speculative task in a particular row is given by the 
placement of the first slow task in the row. We also move this corresponding 
TIP column to the back in exactly the same way we do for virgin tasks. This way 
we do speculative execution also in the order of split sizes. 

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to