[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

Will Murnane (JIRA) Thu, 22 Sep 2016 07:28:48 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513435#comment-15513435
 ]


Will Murnane commented on ACCUMULO-4468:
----------------------------------------

[~elserj] I'm not sure why customVanilla and standardEquals behave differently. 
The difference is small, and perhaps it's due to variance in the JDK used to 
compile the standard Accumulo JAR versus the one used to compile the benchmark 
code? Maybe there are effects from having the code loaded from a small JAR 
versus a large one? Maybe the custom WillKey class gets laid out in memory 
differently, and it hits instruction cache differently? This is the problem of 
benchmarking...

RE: generation of data, yeah, the current test data... leaves something to be 
desired. This was basically the least-worst mechanism I could come up with in 5 
minutes to generate some test data that kinda-sorta resembles our production 
data. If anyone has a better strategy I'm willing to do a little legwork 
testing other data sets.

[~kturner] The parts of the key are stored on the heap somewhere, so the 
problem of row equality is somewhat different than the problem of comparing two 
contiguous byte arrays. That said, maybe there would be benefits to storing all 
the pieces of the Key in a single byte array, and maintaining indices into it 
to track the individual parts, rather than several smaller arrays... That's a 
big refactor, though, for an unknown change in performance.

I think it would be worth revisiting the comparison mechanism in isEqual, too, 
doing something like the Unsafe method used in Hadoop's FastByteComparisons 
class but going in reverse. The CPU's speculative prefetch should work in 
either direction, but doing the comparison byte-at-a-time is going to be more 
expensive than the 64-bit comparisons that FastByteComparisons does. But that's 
a topic for another ticket ;)

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> ----------------------------------------------------------
>
>                 Key: ACCUMULO-4468
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.8.0
>            Reporter: Will Murnane
>            Priority: Trivial
>              Labels: newbie, performance
>         Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

Reply via email to