[jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast

Vivek Ratan (JIRA) Wed, 06 Feb 2008 04:59:34 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566120#action_12566120
 ]


Vivek Ratan commented on HADOOP-2119:
-------------------------------------

Regarding finding the next task to run: 

The problem is that we scan the array of TaskInProgress(TIP) objects (the 
'tasks' array) from the beginning, each time we need to find a task to run and 
we haven't found one in the cache. When you have lots and lots of tasks, after 
some time the tasks array will predominantly be filled with running or 
completed tasks, especially at lower indexes. So it takes longer and longer to 
scan the array for virgin tasks (tasks that haven't run yet)  or failed tasks 
if we always start from the beginning. There are a number of ways to get around 
this: 

1. On a cache miss, if you don't mind returning a virgin task ahead of a failed 
task, even though the failed task occurs earlier in the array than the virgin 
task, then you can do the following. Keep an index (a pointer) into the tasks 
array, which points to the first task in the array that is a virgin task. Every 
TIP to the left of this index will have been run at least once. Every time we 
need to scan the array, we look for runnable tasks from this index onwards. 
This index will move across the length of the array just once during the 
running of all tasks, so it averages out to O(1) for finding the first virgin 
task. This is fairly easy to code, and should provide significant savings in 
the cases where there are lots and lots of tasks. If a virgin task is not 
found, you can use the same code to look for a failed task or a speculative 
task. 

2. If you want to keep the same algorithm for finding the next task, i.e., if 
you want to return either a virgin or failed task first on a cache miss, then 
you can keep an index into the array which represents the first failed or 
virgin task. This approach won't be as fast as the first one, since this index 
can move back and forth across the array, but it should still be better over 
what we do today. In both options, the index will have to be potentially 
updated whenever a task's status changes. 

A more detailed approach requires maintaining separate lists for tasks with 
different states: one for virgin tasks, one for failed, one for completed, and 
one for running. As a task's status changes, it moves from one list to another. 
These lists can be sorted (the list of virgin tasks should be sorted in 
decreasing order of input size; the other lists can be sorted the same way, or 
perhaps in order of when the tasks ran, which simplifies things). Picking the 
next task is as simple as walking down the list (in many cases, just picking 
the first element of a list). Care must be taken that moving an element from 
one list to another is as effective as possible. The array of TIPS is no longer 
needed. I can think of at least a couple of ways of doing this: 

3. Modify a TIP object to have a reference to a 'previous' TIP object and a 
'next' TIP object, so that we have linked lists of TIPS. Then a list is just a 
reference to the first (and last) TIP object, and a TIP belongs to just one 
list. If objects are added into the Running, Completed, and Failed lists at the 
tail only, handling a TIP's status change is O(1), but the lists are no longer 
sorted by size. Rather they are sorted by when the task ran. Finding the next 
task is pretty much constant, as we either pick the first TIP in the Virgin 
list, or the first TIP in the failed list (such that the TIP didn't fail on the 
host), or we walk through the Running list to find the first speculative task. 
If you need the lists sorted by size, then insertion is O(n). This is a fair 
bit of change, as far as coding is concerned. 

4. Lists can also be implemented as arrays of TIPS. This takes more space but 
moving a TIP from one list to another is faster than in the linked-list case. 
Insertion into a list sorted by size can be O(log n). 

You can also have hybrid approaches where you just keep lists for virgin tasks, 
for example. 

I think it makes most sense to go with Option 1 for now, as it's the easiest to 
implement and makes the most common case run much faster. Options 3 and 4 need 
a fair bit of refactoring and may be an overkill for now, since you can get the 
most bang for the buck by just making sure that you don't scan the array from 
the beginning for virgin tasks.  

> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2119) JobTracker becomes non-responsive if the task trackers finish task too fast

Reply via email to