Robert Joseph Evans created MAPREDUCE-4303:
----------------------------------------------
Summary: Look at using String.intern to dedupe some Strings
Key: MAPREDUCE-4303
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: applicationmaster
Affects Versions: 2.0.0-alpha, 0.23.3
Reporter: Robert Joseph Evans
MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are
other places where it is not as simple to remove the duplicates. In these
cases the source of the strings is an incoming RPC call or from parsing and
reading in a file. The only real way to dedupe these is to either use
String.intern() which if not used properly could result in the permgen space
being filled up, or by playing games with our own cache, and trying to do the
same sort of thing as String.intern, but in the heap.
The following are some that I saw lots of duplicate strings that we should look
at doing something about.
TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
MapTaskAttemptImpl.diagnostics
The keys to Counters.groups
GenericGroup.displayName
The keys to GenericGroup.counters
and GenericCounter.displayName
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira