[
https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Tubbs resolved ACCUMULO-4145.
-----------------------------------------
Resolution: Fixed
> Eliminate constant unnecessary Text wrapping of immutable tableIDs
> ------------------------------------------------------------------
>
> Key: ACCUMULO-4145
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Christopher Tubbs
> Assignee: Christopher Tubbs
> Labels: tedious
> Fix For: 1.8.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I started looking at ACCUMULO-4138 to see how KeyExtent is used to do
> overlaps, and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}}
> calls, where that first parameter is the Text version of tableId.
> It appears, we even do this practice so much, that KeyExtent has a built-in
> WeakReference caching of these tableIds, so the Java GC we can dedupe them
> and avoid creating too many.
> The thing is... a lot of this attempt to optimize appears to be the result of
> unnecessarily wrapping all these immutable String tableIDs with a Text
> object, yet it doesn't really seem to buy us anything. In most of our API and
> a lot of internal utilities, these are already Strings with references
> elsewhere in the code... so even if we dedupe the Text objects, we're not
> doing anything about these Strings. Worse, we're actually doing some
> protective copying here and there as we pass around these Text objects, using
> TextUtil, etc. This is completely unnecessary, because they were immutable
> before we copied them and wrapped them.
> In some cases, we actually call toString on the text objects to pass them
> around, and they get flip-flopped a few times between String and Text,
> depending on what the internal API accepts.
> At best, using the Text object helps us serialize KeyExtent... but that's
> questionably beneficial since DataOutput can already serialize with:
> {{out.writeUTF(String)}}, so that's not actually that helpful.
> The only other benefit I could possibly see is with compareTo for
> lexicographical comparison, but I have a hard time believing Text can compare
> faster than {{java.lang.String}}, and there shouldn't be any difference in
> the results of the comparison between the UTF-8 encoding of Text and the
> modified UTF encoding which is native to Java Strings, certainly not for the
> base-36 characters and fixed constants (for special tables) we use in
> tableIDs.
> We should just completely strip out all the Text versions of tableId,
> wherever possible. I've already done this as an exercise in the 1.6 branch,
> but it might be potentially risky as these are put in Java maps which don't
> have strict type checking ({{map.get(Object)}}, for instance) and are
> sometimes compared with {{.equals(Object)}}. For what it's worth, the only
> public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was
> only inadvertently in the public API and has since been moved (IIRC). We can
> preserve the old methods for compatibility, if necessary (deprecated, of
> course).
> Also, getting rid of these Text objects, and just sticking with the immutable
> String objects, we'll be able to take advantage of future JVM improvements
> for String deduplication, like
> http://java-performance.info/java-string-deduplication/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)