[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652467#comment-16652467
 ] 

Hudson commented on MAPREDUCE-7150:
-----------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15231 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15231/])
MAPREDUCE-7150. Optimize collections used by MR JHS to reduce its (haibochen: 
rev babd1449bf8898f44c434c852e67240721c0eb00)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/FileSystemCounterGroup.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTask.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedTaskAttempt.java


> Optimize collections used by MR JHS to reduce its memory
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-7150
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver, mrv2
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to