[ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577454#action_12577454
 ] 

amar_kamat edited comment on HADOOP-2119 at 3/11/08 7:19 AM:
-------------------------------------------------------------

With a similar approach as discussed above and some optimizations (one of which 
is that the batching (task commit) now is in stages i.e *batch-size* tips from 
the queue get batch committed in one go) we could process large number of maps 
successfully.
The job description is as follows
1) 250 nodes
2) random-writer modified to do the following : map data goes to the local 
filesystem and reducers do nothing.
3) num maps : 3,20,000
4) num reducers : 450
5) bytes per map : 8mb
6) total data : 2.5 TB
7) batch commit size = 5000 i.e at a time only 5000 tips are committed
The map phase took approx 40 min. 
The only problem is that of the reducer-scheduling from the JT. The maps finish 
so fast that the map load is always low and the reducers always start after the 
maps are done. Simple tricks of increasing the  number of _task completion 
events_,  _jetty threads_ etc might help but wont provide a scalable solution. 
So it seems that tweaking the load logic in the JT i.e 
{{getNewTaskForTaskTracker()}} is the only way. We are currently trying lots of 
optimizations and will post a stable/final version of the approach along with a 
patch soon.

      was (Author: amar_kamat):
    With a similar approach as discussed above and some optimizations (one of 
which is that the batching (task commit) now is in stages i.e *batch-size* tips 
from the queue get batch committed in one go) we could process large number of 
maps successfully.
The job description is as follows
1) 250 nodes
2) random-writer modified to do the following : map data goes to the local 
filesystem and reducers do nothing.
3) num maps : 3,20,000
4) num reducers : 450
5) bytes per map : 7mb
6) total data : 2.5 TB
7) batch commit size = 5000 i.e at a time only 5000 tips are committed
The map phase took approx 40 min. 
The only problem is that of the reducer-scheduling from the JT. The maps finish 
so fast that the map load is always low and the reducers always start after the 
maps are done. Simple tricks of increasing the  number of _task completion 
events_,  _jetty threads_ etc might help but wont provide a scalable solution. 
So it seems that tweaking the load logic in the JT i.e 
{{getNewTaskForTaskTracker()}} is the only way. We are currently trying lots of 
optimizations and will post a stable/final version of the approach along with a 
patch soon.
  
> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: hadoop-2119.patch, hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to