[
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510743#comment-15510743
]
Josh Elser commented on ACCUMULO-4468:
--------------------------------------
{quote}
bq. Do you have other examples of where this might be used in a tight loop?
I think there are lots of other examples in Accumulo itself; for example, in
iterators.system.DeletingIterator and iterators.user.TransformingIterator, this
method is called in a loop. In our use case, we're doing a similar thing:
building a larger object out of multiple rows, by finding groups of rows which
are equal under ROW_COLFAM. When each object is built from only a few rows, the
CF equality comparison returns false pretty often (which is to be expected),
but only after comparing row IDs, which are always the same in practice.
{quote}
Perhaps my concern didn't come across. From my perspective: I am concerned with
a performance change that makes one case better. We need to understand if the
case you outlined is "the norm" or "the exception". I was hoping you had
context on this.
{quote}
bq. How did you test this? What types of numbers did you see?
I haven't been able to install it on a cluster to test. The test suite does
pass with this patch applied. I think it's a minor change; in the "rows are
equal" case the same amount of work is done as with the existing code, although
the parts are accessed in the opposite order. They're still compared
mostly-in-order, as isEqual does, but the comment in that function was
inspiration to try reversing the comparison order.
Aside from performance, the code seems cleaner to me: there's no more
repetition of e.g. the check of row equality. The bytecode (with Oracle javac
1.8.0_92) is substantially smaller: 389 bytes versus 167.
{quote}
At risk of being anti-social, I am -1 on any change for performance without
numbers coming with it. There are many great tools out there to benchmark
changes and don't necessarily require use of a cluster (it might actually be
harder to test on a cluster).
[JMH|http://openjdk.java.net/projects/code-tools/jmh/] and [Google
Caliper|https://github.com/google/caliper] are two good tools for
micro-benchmarking.
> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.8.0
> Reporter: Will Murnane
> Priority: Trivial
> Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares
> starting at the beginning of the key, and works its way toward the end. This
> functions correctly, of course, but one of the typical uses of this method is
> to compare adjacent rows to break them into larger chunks. For example,
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf,
> and finally row. This (marginally) improves the speed of comparisons in the
> relatively common case where only the last part is changing, with less
> complex code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)