[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479420#comment-13479420
 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

I rand some benchmarks looking at the Job History server using a jhist file for 
a job that had 9416 maps and 500 reducers.  I then used a combination of 
YourKit and jhat to look at the heap savings.

For Jhat I did the OQL {noformat}select 
sum(map(heap.objects("java.lang.String"),"sizeof(it)")){noformat} to get the 
size of all of the strings currently reachable on the heap.

I saw that nothing changed in between the base and the first patch.  Both of 
them had 22MB of strings in the heap.  Looking at the code that was changed to 
do interning, the only code that uses it was rumen.  It is still a good change, 
but it did not have the impact I was looking for.  So I implemented the patch I 
just attached which adds in interning of Strings that are parsed out of the 
jhist file.  This reduced the 22MB of strings to 3MB of strings.

I want to do something similar for the AM, but it is more difficult to look at, 
and I don't think I will have time in the near future. So if someone else could 
review this we can check it in and file a follow up JIRA for looking at the AM. 
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, 
> MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing 
> counter names (String objects and character arrays). Since most counter names 
> are reused again and again, it would be a big memory savings to keep a hash 
> set of already-used counter names within a job, and refer to the same object 
> from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to