[
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652407#comment-16652407
]
Haibo Chen commented on MAPREDUCE-7150:
---------------------------------------
+1 on the latest patch. Will check it in shortly.
> Optimize collections used by MR JHS to reduce its memory
> --------------------------------------------------------
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobhistoryserver, mrv2
> Reporter: Misha Dmitriev
> Assignee: Misha Dmitriev
> Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch,
> YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big
> heap in a large clusters, handling large MapReduce jobs. The heap is large
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java
> collections, mostly maps and lists that are either empty or contain only one
> element. In such under-populated collections considerable amount of memory is
> still used by just the internal implementation objects. See the attached
> excerpt from the jxray report for the details. If certain collections are
> almost always empty, they should be initialized lazily. If others almost
> always have just 1 or 2 elements, they should be initialized with the
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
> # {{FileSystemCounterGroup.map}} - initialize lazily
> # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks
> only have one or two attempts
> # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
> # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it
> contains one diagnostic message most of the time
> # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to
> use the more wasteful LinkedList here) and initialize with capacity 1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]