[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4303.
--------------------------------------------

    Resolution: Duplicate
    
> Look at using String.intern to dedupe some Strings
> --------------------------------------------------
>
>                 Key: MAPREDUCE-4303
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>    Affects Versions: 0.23.3, 2.0.0-alpha
>            Reporter: Robert Joseph Evans
>
> MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are 
> other places where it is not as simple to remove the duplicates.  In these 
> cases the source of the strings is an incoming RPC call or from parsing and 
> reading in a file.  The only real way to dedupe these is to either use 
> String.intern() which if not used properly could result in the permgen space 
> being filled up, or by playing games with our own cache, and trying to do the 
> same sort of thing as String.intern, but in the heap.
> The following are some that I saw lots of duplicate strings that we should 
> look at doing something about.
> TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
> MapTaskAttemptImpl.diagnostics
> The keys to Counters.groups
> GenericGroup.displayName
> The keys to GenericGroup.counters
> and GenericCounter.displayName

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to