[
https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008597#comment-13008597
]
Allen Wittenauer commented on MAPREDUCE-2345:
---------------------------------------------
> But how about a running job with tens of thousands of tasks? We see that big
> running
> jobs use much memory in the cluster.
This is almost always a sign that either the data being read is not laid out
efficiently/too small of block size, that one needs to use
CombinedFileInputFormat, or there just too many reducers in play. There is
almost never a reason to have jobs in the x0,000 area unless the dataset is
Just That Big.
> Optimize jobtracker's memory usage
> -------------------------------------
>
> Key: MAPREDUCE-2345
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Affects Versions: 0.21.0
> Reporter: MengWang
> Labels: hadoop
> Fix For: 0.23.0
>
> Attachments: jt-memory-useage.bmp
>
>
> Too many tasks will eat up a considerable amount of JobTracker's heap space.
> According to our observation, 50GB heap size can support to 5,000,000 tasks,
> so we should optimize jobtracker's memory usage for more jobs and tasks.
> Yourkit java profile show that counters, duplicate strings, task waste too
> much memory. Our optimization around these three points reduced jobtracker's
> memory to 1/3.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira