[
https://issues.apache.org/jira/browse/ACCUMULO-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154316#comment-15154316
]
ASF GitHub Bot commented on ACCUMULO-4145:
------------------------------------------
Github user keith-turner commented on a diff in the pull request:
https://github.com/apache/accumulo/pull/72#discussion_r53470957
--- Diff:
core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java ---
@@ -60,7 +59,7 @@ public ScannerImpl(ClientContext context, String table,
Authorizations authoriza
checkArgument(table != null, "table is null");
checkArgument(authorizations != null, "authorizations is null");
this.context = context;
- this.table = new Text(table);
+ this.table = table;
--- End diff --
seems like this is a tableId... could change the name
> Eliminate constant unnecessary Text wrapping of immutable tableIDs
> ------------------------------------------------------------------
>
> Key: ACCUMULO-4145
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4145
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Christopher Tubbs
> Assignee: Christopher Tubbs
> Labels: tedious
> Fix For: 1.8.0
>
>
> I started looking at ACCUMULO-4138 to see how KeyExtent is used to do
> overlaps, and I noticed a *lot* of {{new KeyExtent(new Text(String), ...)}}
> calls, where that first parameter is the Text version of tableId.
> It appears, we even do this practice so much, that KeyExtent has a built-in
> WeakReference caching of these tableIds, so the Java GC we can dedupe them
> and avoid creating too many.
> The thing is... a lot of this attempt to optimize appears to be the result of
> unnecessarily wrapping all these immutable String tableIDs with a Text
> object, yet it doesn't really seem to buy us anything. In most of our API and
> a lot of internal utilities, these are already Strings with references
> elsewhere in the code... so even if we dedupe the Text objects, we're not
> doing anything about these Strings. Worse, we're actually doing some
> protective copying here and there as we pass around these Text objects, using
> TextUtil, etc. This is completely unnecessary, because they were immutable
> before we copied them and wrapped them.
> In some cases, we actually call toString on the text objects to pass them
> around, and they get flip-flopped a few times between String and Text,
> depending on what the internal API accepts.
> At best, using the Text object helps us serialize KeyExtent... but that's
> questionably beneficial since DataOutput can already serialize with:
> {{out.writeUTF(String)}}, so that's not actually that helpful.
> The only other benefit I could possibly see is with compareTo for
> lexicographical comparison, but I have a hard time believing Text can compare
> faster than {{java.lang.String}}, and there shouldn't be any difference in
> the results of the comparison between the UTF-8 encoding of Text and the
> modified UTF encoding which is native to Java Strings, certainly not for the
> base-36 characters and fixed constants (for special tables) we use in
> tableIDs.
> We should just completely strip out all the Text versions of tableId,
> wherever possible. I've already done this as an exercise in the 1.6 branch,
> but it might be potentially risky as these are put in Java maps which don't
> have strict type checking ({{map.get(Object)}}, for instance) and are
> sometimes compared with {{.equals(Object)}}. For what it's worth, the only
> public API this really touches is {{o.a.a.core.data.KeyExtent}}, which was
> only inadvertently in the public API and has since been moved (IIRC). We can
> preserve the old methods for compatibility, if necessary (deprecated, of
> course).
> Also, getting rid of these Text objects, and just sticking with the immutable
> String objects, we'll be able to take advantage of future JVM improvements
> for String deduplication, like
> http://java-performance.info/java-string-deduplication/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)