[
https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999218#comment-12999218
]
MengWang commented on MAPREDUCE-2345:
-------------------------------------
jobtracker's memory mainly used for TaskInProgress objects. We submit a Job
with 100,087 tasks, jt's memory usage as follows:
org.apache.hadoop.mapred.TaskInProgress
object 100,087
Shallow size 29,625,752
Retained size 325,065,944 (96%)
Our optimization work as follows:
(1)Reduce duplicated strings
jobtracker stores too many duplicated strings, for example: splitClass name,
splite locations, counters group name, couters name, display name, jtIdentifier
of JobID, jobdir of MapOutputFile.
we use a StringCache reduced nearly 15% memory.
(2)Counters should be delay initialized
tips with no attempttask assigned should not create Counters.
(3)Reconstruct completed TIP's counters
when a task completed, the tip of this task become bigger because of
counters. To speed up Counters update and lookup, Counters use HashMap and a
cache, which cost too much memory. So we seperated counter values from Counters
structure, all tasks share a CounterMap object, which map <CounterGroupName,
CounterName> -> index of a long array, and every tip store a array of its
counter values.
Using this method, JT's memory reduced nearly 50%.
> Optimize jobtracker's memory usage
> -------------------------------------
>
> Key: MAPREDUCE-2345
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Affects Versions: 0.21.0
> Reporter: MengWang
> Labels: hadoop
> Fix For: 0.23.0
>
> Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space.
> According to our observation, 50GB heap size can support to 5,000,000 tasks,
> so we should optimize jobtracker's memory usage for more jobs and tasks.
> Yourkit java profile show that counters, duplicate strings, Task waste too
> much memory. Our optimization around these three points reduced jobtracker's
> memory to 1/3.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira