[
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577454#action_12577454
]
amar_kamat edited comment on HADOOP-2119 at 3/11/08 7:19 AM:
-------------------------------------------------------------
With a similar approach as discussed above and some optimizations (one of which
is that the batching (task commit) now is in stages i.e *batch-size* tips from
the queue get batch committed in one go) we could process large number of maps
successfully.
The job description is as follows
1) 250 nodes
2) random-writer modified to do the following : map data goes to the local
filesystem and reducers do nothing.
3) num maps : 3,20,000
4) num reducers : 450
5) bytes per map : 8mb
6) total data : 2.5 TB
7) batch commit size = 5000 i.e at a time only 5000 tips are committed
The map phase took approx 40 min.
The only problem is that of the reducer-scheduling from the JT. The maps finish
so fast that the map load is always low and the reducers always start after the
maps are done. Simple tricks of increasing the number of _task completion
events_, _jetty threads_ etc might help but wont provide a scalable solution.
So it seems that tweaking the load logic in the JT i.e
{{getNewTaskForTaskTracker()}} is the only way. We are currently trying lots of
optimizations and will post a stable/final version of the approach along with a
patch soon.
was (Author: amar_kamat):
With a similar approach as discussed above and some optimizations (one of
which is that the batching (task commit) now is in stages i.e *batch-size* tips
from the queue get batch committed in one go) we could process large number of
maps successfully.
The job description is as follows
1) 250 nodes
2) random-writer modified to do the following : map data goes to the local
filesystem and reducers do nothing.
3) num maps : 3,20,000
4) num reducers : 450
5) bytes per map : 7mb
6) total data : 2.5 TB
7) batch commit size = 5000 i.e at a time only 5000 tips are committed
The map phase took approx 40 min.
The only problem is that of the reducer-scheduling from the JT. The maps finish
so fast that the map load is always low and the reducers always start after the
maps are done. Simple tricks of increasing the number of _task completion
events_, _jetty threads_ etc might help but wont provide a scalable solution.
So it seems that tweaking the load logic in the JT i.e
{{getNewTaskForTaskTracker()}} is the only way. We are currently trying lots of
optimizations and will post a stable/final version of the approach along with a
patch soon.
> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
> Key: HADOOP-2119
> URL: https://issues.apache.org/jira/browse/HADOOP-2119
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.17.0
>
> Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap
> space limit).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.