[jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast

Amar Kamat (JIRA) Thu, 07 Feb 2008 07:12:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566626#action_12566626
 ]


Amar Kamat commented on HADOOP-2119:
------------------------------------

I will summarize the designs here 
_Notations_
{noformat}
n      : number of TIPs
c      : number of TIPs that can potentially be executed locally.
k      : maximum number of tasks that can be run simultaneously (<<n)
f      : number of failed tasks (<<r)
Pop    : get the first TIP
b      : number of bins of TIPs = n/s (where, s  : max number of TIPs per bin 
(<k))
{noformat}
The following analysis considers the worst case scenarios.
||#||Algorithm For||Description||General Design Requirements||Time 
Complexity||Space Complexity||Implementation Complexity||
|1|Virgin Task|Maintain a running pointer rather than starting from 0.\\ The 
pointer will always more forward|Fast Pop and Delete operation|Pop : O(c) 
(worst case but will happen at most once)\\No deletions required| O(1)|Low|
|2|Virgin Task|Maintain a doubly linked list using array|Fast Pop and Delete 
operation|Pop : O(1)\\Delete : O(1)| O( n )|High|
|3|Failed Task|Maintain a sorted list of TIPs that have failed|Small Size, Less 
Insert/Pop/Delete|All O(log(f)) operations|O(f)|Low|
|4|Running Task|Maintain an array of sorted lists (where the array is of size 
{{b}}).\\Membership of a TIP to a bin/group is determined by 
{{TIP.getPartition()/s}}|Large Size\\Frequent Insert/Pop/Delete|Insert : 
O(log(s))\\Delete O(log(s))\\Scan : O(k) |O(k + b)|Medium|
|5|Running Task|Maintain an array of lists (where the array is of size 
{{b}}).\\Membership of a TIP to a bin/group is determined by 
{{TIP.getPartition()/s}}| Not sorted\\Large Size\\Frequent Insert/Delete|Insert 
: O(1)\\Delete O(s)\\Scan : O(k)  |O(k + b)|Medium|
|6|Running Task|Maintain one sorted list| Large Size\\Frequent 
Insert/Delete|Insert : O(log(k))\\Delete : O(log(k))\\Scan : O(k) |O(k)|Simple|

{noformat}
Easiest to implement    :  (1) + (3) + (6)
Most Space efficient    :  (1) + (3) + (6)
Most Time efficient     :  (2) + (3) + (4) (most inefficient w.r.t space and 
implementation complexity)
{noformat}

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast

Reply via email to