[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

Benedict (JIRA) Tue, 08 Aug 2017 05:28:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118100#comment-16118100
 ]


Benedict edited comment on CASSANDRA-9988 at 8/8/17 12:27 PM:
--------------------------------------------------------------

bq. The original implementation is valid. I think that's another example shows 
the code is not that easy to understand.

Mea culpa, although I think that's more an indictment of my hubris and our 
mutual disinterest.  If fifteen distracted minutes in a JIRA comment window 
procrastinating from real work yields a minor bug, it's probably not *that* 
complicated, and an IDE, test suite and a couple of hours would probably more 
than conquer it.

bq. The graph is the search performance (throughput) for searching target on 
the different positions.

Except that this isn't the same, because the search domain is different with 
different starting positions.  This is also over-counting unrelated test setup 
costs.  One would expect that, with a leaf size of 32, iterating the whole 
contents would be _approx._ 4x faster with exp. search than binary search (and 
5x faster for a single search from the beginning); the fact it is only 3x is 
probably down to those extra costs.  The worst inversion of performance would 
be around 2x (i.e. exp. search being 2x slower than b.search at its worst), 
with logarithmically spaced items - i.e. at position 16, 8, 4, 2, 1.  However, 
in the slowest case for exp. search, we are doing much less work than the 
slowest for binary search.

Of course, with a limit to 32 items the exp. search benefits are much less 
apparent.  If we had a leaf size of 64 it would become much more stark.  With a 
leaf size of 16, it would also become completely irrelevant.




was (Author: benedict):
bq. The original implementation is valid. I think that's another example shows 
the code is not that easy to understand.

Mea culpa, although I think that's more an indictment of my hubris and our 
mutual disinterest.  If fifteen distracted minutes in a JIRA comment window 
procrastinating from real work yields a minor bug, it's probably not *that* 
complicated, and an IDE, test suite and a couple of hours would probably more 
than conquer it.

bq. The graph is the search performance (throughput) for searching target on 
the different positions.

Except that this isn't the same, because the search domain is different with 
different starting positions.  This is also over-counting unrelated test setup 
costs.  One would expect that, with a leaf size of 32, iterating the whole 
contents would be _approx._ 4x faster with exp. search than binary search; the 
fact it is only 3x is probably down to those extra costs.  The worst inversion 
of performance would be around 2x (i.e. exp. search being 2x slower than 
b.search at its worst), with logarithmically spaced items - i.e. at position 
16, 8, 4, 2, 1.  However, in the slowest case for exp. search, we are doing 
much less work than the slowest for binary search.

Of course, with a limit to 32 items the exp. search benefits are much less 
apparent.  If we had a leaf size of 64 it would become much more stark.  With a 
leaf size of 16, it would also become completely irrelevant.



> Introduce leaf-only iterator
> ----------------------------
>
>                 Key: CASSANDRA-9988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

Reply via email to