[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

Josh Elser (JIRA) Wed, 21 Sep 2016 11:20:35 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510743#comment-15510743
 ]


Josh Elser commented on ACCUMULO-4468:
--------------------------------------

{quote}
bq.    Do you have other examples of where this might be used in a tight loop?

I think there are lots of other examples in Accumulo itself; for example, in 
iterators.system.DeletingIterator and iterators.user.TransformingIterator, this 
method is called in a loop. In our use case, we're doing a similar thing: 
building a larger object out of multiple rows, by finding groups of rows which 
are equal under ROW_COLFAM. When each object is built from only a few rows, the 
CF equality comparison returns false pretty often (which is to be expected), 
but only after comparing row IDs, which are always the same in practice.
{quote}

Perhaps my concern didn't come across. From my perspective: I am concerned with 
a performance change that makes one case better. We need to understand if the 
case you outlined is "the norm" or "the exception". I was hoping you had 
context on this.

{quote}
bq.  How did you test this? What types of numbers did you see?

I haven't been able to install it on a cluster to test. The test suite does 
pass with this patch applied. I think it's a minor change; in the "rows are 
equal" case the same amount of work is done as with the existing code, although 
the parts are accessed in the opposite order. They're still compared 
mostly-in-order, as isEqual does, but the comment in that function was 
inspiration to try reversing the comparison order.

Aside from performance, the code seems cleaner to me: there's no more 
repetition of e.g. the check of row equality. The bytecode (with Oracle javac 
1.8.0_92) is substantially smaller: 389 bytes versus 167.
{quote}

At risk of being anti-social, I am -1 on any change for performance without 
numbers coming with it. There are many great tools out there to benchmark 
changes and don't necessarily require use of a cluster (it might actually be 
harder to test on a cluster). 
[JMH|http://openjdk.java.net/projects/code-tools/jmh/] and [Google 
Caliper|https://github.com/google/caliper] are two good tools for 
micro-benchmarking.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

Reply via email to