[
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566472#action_12566472
]
Vivek Ratan commented on HADOOP-2119:
-------------------------------------
>> This is unacceptable. We used to have this behavior and it was very bad. If
>> map 0 fails deterministically, you absolutely do not want all 200,000 maps
>> to fail 4 times each before the job fails.
That's a valid point. However, today's code doesn't handle this situation
completely well either. If you're treating a virgin task and a failed task with
the same priority, then it's quite possible that you run a number of virgin
tasks before you get to a failed task. You should really be prioritizing failed
tasks over virgin tasks if you want to avoid the situation you've mentioned,
though such a policy may cause unfairness in other situations.
I think what we really need is what I mentioned for Options 3 and 4 - separate
data structures to hold virgin tasks, failed tasks, and running tasks.
- The data structure for virgin tasks does not need dynamic insertion
capabilities (i.e., once created, you will never add any task to it). It also
needs to be sorted by size. This structure needs to support deletion (once a
task runs, it will be removed from this structure), and it also needs to handle
large sizes (up to the number of tasks).
- The data structure for failed tasks need to support quick insertions and
deletions, but will be smaller in size (as the number of failed tasks is
typically low).
- The data structure for running tasks need to support quick insertions and
deletions, and it's size is bounded (by the number of task trackers), though
not necessarily small. Note that when a task is running and its status changes,
we'll need to find it quickly in the running list so we can move it to
another. The data structure should optimize a lookup as much as possible.
Given all this, Devaraj, Amar, and I feel that the following data structure
choices are the most appropriate:
- We leave the cache as is.
- For virgin tasks, we keep the tasks array, and have a running index to it (as
suggested in Option 1). We could alternately have a doubly linked list
structure with some sort of indexing (a list implemented as an array).
- For failed tasks, we can keep a linked list or a sorted structure such as a
Java sorted set. If we're OK not needing to keep failed tasks sorted by size
(we could sort them by when they ran), then we can just keep a linked list and
insert at the end.
- For running tasks, given that the structure size is potentially larger than
that for failed tasks, we can still keep a sorted map, or have some sort of a
composite structure (which Devaraj/Amar have discussed, and can describe).
Again, if we're OK not needing to keep running tasks sorted by size (we could
sort them by when they ran), then we can just keep a linked list and insert at
the end.
By having multiple lists and keeping the array for virgin tasks, we're using
more memory, but if that becomes a problem, we can use some other structure to
represent virgin tasks.
> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
> Key: HADOOP-2119
> URL: https://issues.apache.org/jira/browse/HADOOP-2119
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.17.0
>
> Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap
> space limit).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.