[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008597#comment-13008597
 ] 

Allen Wittenauer commented on MAPREDUCE-2345:
---------------------------------------------

> But how about a running job with tens of thousands of tasks? We see that big 
> running 
> jobs use much memory in the cluster. 

This is almost always a sign that either the data being read is not laid out 
efficiently/too small of block size, that one needs to use 
CombinedFileInputFormat, or there just too many reducers in play.  There is 
almost never a reason to have jobs in the x0,000 area unless the dataset is 
Just That Big.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> Too many tasks will eat up a considerable amount of JobTracker's heap space. 
> According to our observation, 50GB heap size can support to 5,000,000 tasks, 
> so we should optimize jobtracker's memory usage for more jobs and tasks. 
> Yourkit java profile show that counters, duplicate strings, task waste too 
> much memory. Our optimization around these three points reduced jobtracker's 
> memory to 1/3. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to