[
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566626#action_12566626
]
Amar Kamat commented on HADOOP-2119:
------------------------------------
I will summarize the designs here
_Notations_
{noformat}
n : number of TIPs
c : number of TIPs that can potentially be executed locally.
k : maximum number of tasks that can be run simultaneously (<<n)
f : number of failed tasks (<<r)
Pop : get the first TIP
b : number of bins of TIPs = n/s (where, s : max number of TIPs per bin
(<k))
{noformat}
The following analysis considers the worst case scenarios.
||#||Algorithm For||Description||General Design Requirements||Time
Complexity||Space Complexity||Implementation Complexity||
|1|Virgin Task|Maintain a running pointer rather than starting from 0.\\ The
pointer will always more forward|Fast Pop and Delete operation|Pop : O(c)
(worst case but will happen at most once)\\No deletions required| O(1)|Low|
|2|Virgin Task|Maintain a doubly linked list using array|Fast Pop and Delete
operation|Pop : O(1)\\Delete : O(1)| O( n )|High|
|3|Failed Task|Maintain a sorted list of TIPs that have failed|Small Size, Less
Insert/Pop/Delete|All O(log(f)) operations|O(f)|Low|
|4|Running Task|Maintain an array of sorted lists (where the array is of size
{{b}}).\\Membership of a TIP to a bin/group is determined by
{{TIP.getPartition()/s}}|Large Size\\Frequent Insert/Pop/Delete|Insert :
O(log(s))\\Delete O(log(s))\\Scan : O(k) |O(k + b)|Medium|
|5|Running Task|Maintain an array of lists (where the array is of size
{{b}}).\\Membership of a TIP to a bin/group is determined by
{{TIP.getPartition()/s}}| Not sorted\\Large Size\\Frequent Insert/Delete|Insert
: O(1)\\Delete O(s)\\Scan : O(k) |O(k + b)|Medium|
|6|Running Task|Maintain one sorted list| Large Size\\Frequent
Insert/Delete|Insert : O(log(k))\\Delete : O(log(k))\\Scan : O(k) |O(k)|Simple|
{noformat}
Easiest to implement : (1) + (3) + (6)
Most Space efficient : (1) + (3) + (6)
Most Time efficient : (2) + (3) + (4) (most inefficient w.r.t space and
implementation complexity)
{noformat}
> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
> Key: HADOOP-2119
> URL: https://issues.apache.org/jira/browse/HADOOP-2119
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.17.0
>
> Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap
> space limit).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.