Christopher Tubbs created ACCUMULO-4145:
-------------------------------------------
Summary: Eliminate constant unnecessary Text wrapping of immutable
tableIDs
Key: ACCUMULO-4145
URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
Project: Accumulo
Issue Type: Improvement
Reporter: Christopher Tubbs
Assignee: Christopher Tubbs
Fix For: 1.8.0
I started looking at ACCUMULO-4138 to see how KeyExtent is used to do overlaps,
and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}} calls, where
that first parameter is the Text version of tableId.
It appears, we even do this practice so much, that KeyExtent has a built-in
WeakReference caching of these tableIds, so the Java GC we can dedupe them and
avoid creating too many.
The thing is... a lot of this attempt to optimize appears to be the result of
unnecessarily wrapping all these immutable String tableIDs with a Text object,
yet it doesn't really seem to buy us anything. In most of our API and a lot of
internal utilities, these are already Strings with references elsewhere in the
code... so even if we dedupe the Text objects, we're not doing anything about
these Strings. Worse, we're actually doing some protective copying here and
there as we pass around these Text objects, using TextUtil, etc. This is
completely unnecessary, because they were immutable before we copied them and
wrapped them.
In some cases, we actually call toString on the text objects to pass them
around, and they get flip-flopped a few times between String and Text,
depending on what the internal API accepts.
At best, using the Text object helps us serialize KeyExtent... but that's
questionably beneficial since DataOutput can already serialize with:
{{out.writeUTF(String)}}, so that's not actually that helpful.
The only other benefit I could possibly see is with compareTo for
lexicographical comparison, but I have a hard time believing Text can compare
faster than {{java.lang.String}}, and there shouldn't be any difference in the
results of the comparison between the UTF-8 encoding of Text and the modified
UTF encoding which is native to Java Strings, certainly not for the base-36
characters and fixed constants (for special tables) we use in tableIDs.
We should just completely strip out all the Text versions of tableId, wherever
possible. I've already done this as an exercise in the 1.6 branch, but it might
be potentially risky as these are put in Java maps which don't have strict type
checking ({{map.get(Object)}}, for instance) and are sometimes compared with
{{.equals(Object)}}. For what it's worth, the only public API this really
touches is {{o.a.a.core.data.KeyExtent}}, which was only inadvertently in the
public API and has since been moved (IIRC). We can preserve the old methods for
compatibility, if necessary (deprecated, of course).
Also, getting rid of these Text objects, and just sticking with the immutable
String objects, we'll be able to take advantage of future JVM improvements for
String deduplication, like
http://java-performance.info/java-string-deduplication/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)