[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-11-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848
 ] 

Jason Brown edited comment on CASSANDRA-9988 at 11/9/17 12:15 AM:
--

Overall, I think the patch is pretty solid. A couple of comments and nits:

- is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now
- just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous 
{{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like 
the extra level of sanity checking.

nits:
- {{BTreeSet}} switch imports back to original format
- in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} 
closer to the methods from where they are called. Or better yet, since each is 
only called from one place, just inline the functionality into those calling 
methods.

My biggest problem is with the microbench. I know this ticket has gone through 
a bunch on contortions over it's life, so maybe a little confusion is 
reasonable. However, what I was really hoping to see was a comparison of the 
leaf vs full (tree) iterators, as that's what this whole ticket is about. I 
took liberty of reworking the mircobench to explicitly test the differences 
bewtween leaf and tree iterators, as well as reduce of benchmark executions 
(due to every value between 0 and 31, inclusive, being a param to the 
{{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we 
only really care about testing reading from the beginning, middle, and end of 
the btree entries; anything else just bloats total test execution time). 
Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the 
{{targetIdx}} param, I think they can be moved to another class as they would 
be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a 
way to ignore that in JMH).

My version of your branch with the updated microbench is 
[here|https://github.com/jasobrown/cassandra/tree/9988]

EDIT: forgot to mention that with the updated microbench, I was able to measure 
that the leaf iterator was anywhere from 3-20% faster, depending on the test. 
So, indeed this is a good patch


was (Author: jasobrown):
Overall, I think the patch is pretty solid. A couple of comments and nits:

- is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now
- just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous 
{{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like 
the extra level of sanity checking.

nits:
- {{BTreeSet}} switch imports back to original format
- in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} 
closer to the methods from where they are called. Or better yet, since each is 
only called from one place, just inline the functionality into those calling 
methods.

My biggest problem is with the microbench. I know this ticket has gone through 
a bunch on contortions over it's life, so maybe a little confusion is 
reasonable. However, what I was really hoping to see was a comparison of the 
leaf vs full (tree) iterators, as that's what this whole ticket is about. I 
took liberty of reworking the mircobench to explicitly test the differences 
bewtween leaf and tree iterators, as well as reduce of benchmark executions 
(due to every value between 0 and 31, inclusive, being a param to the 
{{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we 
only really care about testing reading from the beginning, middle, and end of 
the btree entries; anything else just bloats total test execution time). 
Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the 
{{targetIdx}} param, I think they can be moved to another class as they would 
be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a 
way to ignore that in JMH).

My version of your branch with the updated microbench is 
[here|https://github.com/jasobrown/cassandra/tree/9988]

EDIT: forgot to mention that with the updated microbench, the leaf iterator was 
anywhere from 3-20% faster, depending on the test.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, 
> 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, 
> 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit 

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-11-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848
 ] 

Jason Brown edited comment on CASSANDRA-9988 at 11/8/17 10:41 PM:
--

Overall, I think the patch is pretty solid. A couple of comments and nits:

- is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now
- just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous 
{{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like 
the extra level of sanity checking.

nits:
- {{BTreeSet}} switch imports back to original format
- in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} 
closer to the methods from where they are called. Or better yet, since each is 
only called from one place, just inline the functionality into those calling 
methods.

My biggest problem is with the microbench. I know this ticket has gone through 
a bunch on contortions over it's life, so maybe a little confusion is 
reasonable. However, what I was really hoping to see was a comparison of the 
leaf vs full (tree) iterators, as that's what this whole ticket is about. I 
took liberty of reworking the mircobench to explicitly test the differences 
bewtween leaf and tree iterators, as well as reduce of benchmark executions 
(due to every value between 0 and 31, inclusive, being a param to the 
{{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we 
only really care about testing reading from the beginning, middle, and end of 
the btree entries; anything else just bloats total test execution time). 
Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the 
{{targetIdx}} param, I think they can be moved to another class as they would 
be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a 
way to ignore that in JMH).

My version of your branch with the updated microbench is 
[here|https://github.com/jasobrown/cassandra/tree/9988]

EDIT: forgot to mention that with the updated microbench, the leaf iterator was 
anywhere from 3-20% faster, depending on the test.


was (Author: jasobrown):
Overall, I think the patch is pretty solid. A couple of comments and nits:

- is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now
- just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous 
{{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like 
the extra level of sanity checking.

nits:
- {{BTreeSet}} switch imports back to original format
- in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} 
closer to the methods from where they are called. Or better yet, since each is 
only called from one place, just inline the functionality into those calling 
methods.

My biggest problem is with the microbench. I know this ticket has gone through 
a bunch on contortions over it's life, so maybe a little confusion is 
reasonable. However, what I was really hoping to see was a comparison of the 
leaf vs full (tree) iterators, as that's what this whole ticket is about. I 
took liberty of reworking the mircobench to explicitly test the differences 
bewtween leaf and tree iterators, as well as reduce of benchmark executions 
(due to every value between 0 and 31, inclusive, being a param to the 
{{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we 
only really care about testing reading from the beginning, middle, and end of 
the btree entries; anything else just bloats total test execution time). 
Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the 
{{targetIdx}} param, I think they can be moved to another class as they would 
be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a 
way to ignore that in JMH).

My version of your branch with the updated microbench is 
[here|https://github.com/jasobrown/cassandra/tree/9988]

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, 
> 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, 
> 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118100#comment-16118100
 ] 

Benedict edited comment on CASSANDRA-9988 at 8/8/17 12:27 PM:
--

bq. The original implementation is valid. I think that's another example shows 
the code is not that easy to understand.

Mea culpa, although I think that's more an indictment of my hubris and our 
mutual disinterest.  If fifteen distracted minutes in a JIRA comment window 
procrastinating from real work yields a minor bug, it's probably not *that* 
complicated, and an IDE, test suite and a couple of hours would probably more 
than conquer it.

bq. The graph is the search performance (throughput) for searching target on 
the different positions.

Except that this isn't the same, because the search domain is different with 
different starting positions.  This is also over-counting unrelated test setup 
costs.  One would expect that, with a leaf size of 32, iterating the whole 
contents would be _approx._ 4x faster with exp. search than binary search (and 
5x faster for a single search from the beginning); the fact it is only 3x is 
probably down to those extra costs.  The worst inversion of performance would 
be around 2x (i.e. exp. search being 2x slower than b.search at its worst), 
with logarithmically spaced items - i.e. at position 16, 8, 4, 2, 1.  However, 
in the slowest case for exp. search, we are doing much less work than the 
slowest for binary search.

Of course, with a limit to 32 items the exp. search benefits are much less 
apparent.  If we had a leaf size of 64 it would become much more stark.  With a 
leaf size of 16, it would also become completely irrelevant.




was (Author: benedict):
bq. The original implementation is valid. I think that's another example shows 
the code is not that easy to understand.

Mea culpa, although I think that's more an indictment of my hubris and our 
mutual disinterest.  If fifteen distracted minutes in a JIRA comment window 
procrastinating from real work yields a minor bug, it's probably not *that* 
complicated, and an IDE, test suite and a couple of hours would probably more 
than conquer it.

bq. The graph is the search performance (throughput) for searching target on 
the different positions.

Except that this isn't the same, because the search domain is different with 
different starting positions.  This is also over-counting unrelated test setup 
costs.  One would expect that, with a leaf size of 32, iterating the whole 
contents would be _approx._ 4x faster with exp. search than binary search; the 
fact it is only 3x is probably down to those extra costs.  The worst inversion 
of performance would be around 2x (i.e. exp. search being 2x slower than 
b.search at its worst), with logarithmically spaced items - i.e. at position 
16, 8, 4, 2, 1.  However, in the slowest case for exp. search, we are doing 
much less work than the slowest for binary search.

Of course, with a limit to 32 items the exp. search benefits are much less 
apparent.  If we had a leaf size of 64 it would become much more stark.  With a 
leaf size of 16, it would also become completely irrelevant.



> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-06 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116055#comment-16116055
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 8/7/17 5:05 AM:
---

Here are the test results to search target on the different index with the 
following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
(Y axis is throughput, higher is better)
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the 
throughput. {{Dir.DESC}} will have the same performance degradation and we do 
use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above 
code, it should be {{while (prob_ub <= ub && (c = 
comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size 
{{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make 
the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and 
{{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we 
search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now 
for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.

But I'm fine with any solutions:
| 1. binary search | 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | 
[9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| 
[9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |


was (Author: jay.zhuang):
Here are the test results to search target on the different index with the 
following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the 
throughput. {{Dir.DESC}} will have the same performance degradation and we do 
use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above 
code, it should be {{while (prob_ub <= ub && (c = 
comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size 
{{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make 
the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and 
{{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we 
search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now 
for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | 
[9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| 
[9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073
 ] 

Benedict edited comment on CASSANDRA-9988 at 7/29/17 8:22 AM:
--

Assuming your implementation is correct, here is a modified version that works
{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int bound = 0;
 int c = -1;
 while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
 bound = Math.max(bound * 2, 1);
 if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment lb + bound
return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub + 1), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
 while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
 bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int c = -1;
 int probe_ub = lb;
 int probe_incr = 1;
 while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
 lb = probe_ub;
 probe_ub += probe_incr;
 probe_incr *= 2;
 }
 if (c == 0) { return probe_ub; }
return Arrays.binarySearch(keys, lb, Math.min(ub + 1, probe_ub), key, 
comparator);
{code}

This could still be cleaned up a little, in a decent editor.

I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the exponential search as it stands here probably does not behave 
very well for reverse iteration.




was (Author: benedict):
Assuming your implementation is correct, here is a modified version that works
{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int bound = 0;
 int c = -1;
 while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
 bound = Math.max(bound * 2, 1);
 if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment ub
return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
 while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
 bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int c = -1;
 int probe_ub = lb;
 int probe_incr = 1;
 while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
 lb = probe_ub;
 probe_ub += probe_incr;
 probe_incr *= 2;
 }
 if (c == 0) { return probe_ub; }
return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the exponential search as it stands here probably does not 

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073
 ] 

Benedict edited comment on CASSANDRA-9988 at 7/29/17 8:18 AM:
--

Assuming your implementation is correct, here is a modified version that works
{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int bound = 0;
 int c = -1;
 while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
 bound = Math.max(bound * 2, 1);
 if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment ub
return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
 while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
 bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int c = -1;
 int probe_ub = lb;
 int probe_incr = 1;
 while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
 lb = probe_ub;
 probe_ub += probe_incr;
 probe_incr *= 2;
 }
 if (c == 0) { return probe_ub; }
return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the exponential search as it stands here probably does not behave 
very well for reverse iteration.

Also, apologies for against-style-guide variable names.  But this comment box 
is too poor an editor to revise it any further.


was (Author: benedict):
Assuming your implementation is correct, here is a modified version that works
{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int bound = 0;
 int c = -1;
 while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
 bound = Math.max(bound * 2, 1);
 if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment ub
return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
 while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
 bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int c = -1;
 int probe_ub = lb;
 int probe_incr = 1;
 while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
 lb = probe_ub;
 probe_ub += probe_incr;
 probe_incr *= 2;
 }
 if (c == 0) { return probe_ub; }
return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

better still
I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the 

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103033#comment-16103033
 ] 

Benedict edited comment on CASSANDRA-9988 at 7/27/17 10:09 AM:
---

Well, it's been a while since I looked at the internals, but iirc there is an 
extremely common scenario in which we enumerate most of the contents of such an 
iterator via search, and that is in the case of selecting columns from a row.

Since we already have binary search's best case and exponential search's worst 
case in the benchmarks, and given we can expect the inverse scenario to be 
common, why not supplement with that so we can see the two extremes?

Also, FTR: a properly implemented exponential search should only incur 1 
comparison for each search in its best case, as the result of the most recent 
boundary comparison can be saved - if it is zero then no binary search is 
needed over the range.


was (Author: benedict):
Well, it's been a while since I looked at the internals, but iirc there is an 
extremely common scenario in which we enumerate most of the contents of such an 
iterator via search, and that is in the case of selecting columns from a row.

Since we already have binary search's best case and exponential search's worst 
case in the benchmarks, and given we can expect the inverse scenario to be 
common, why not supplement with that so we can see the two extremes?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102704#comment-16102704
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 7/27/17 4:57 AM:


Hi [~benedict], re-read [your comments 1.5 years' 
ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219],
 now I see your point:
{quote}
2. We could probably get uniformly better behaviour by implementing exponential 
search for the leaf iterator; this would bring our total number of comparisons 
down from n lg n to n when searching *an approximately equal set of items*, 
without harming our best case complexity.
{quote}
In that case, {{i}} would be very small ({{i}} is where the target is), so if i 
= 1, expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. 
binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how 
we should define the test: How many re-seeks should we do? How we choose next 
target? If we randomly select the next target, it's unfair to expSearch, if we 
always select next value (basically use search to iterator the tree) is unfair 
to binSearch. It really depends on the user scenario. Do you have any 
suggestion?


was (Author: jay.zhuang):
Hi [~benedict], re-read [your comments 1.5 years' 
ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219],
 now I see your point:
{quote}
2. We could probably get uniformly better behaviour by implementing exponential 
search for the leaf iterator; this would bring our total number of comparisons 
down from n lg n to n when searching *an approximately equal set of items*, 
without harming our best case complexity.
{quote}
In that case, {{i}} would be very small ({{i}} is where the target is), so 
expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. 
binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how 
we should define the test: How many re-seeks should we do? How we choose next 
target? If we randomly select the next target, it's unfair to expSearch, if we 
always select next value (basically use search to iterator the tree) is unfair 
to binSearch. It really depends on the user scenario. Do you have any 
suggestion?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102332#comment-16102332
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 7/26/17 9:46 PM:


They're good feedbacks. I did the tests for expSearch:
!9988-test-result3.png|test!
I think the exponentialSearch didn't out perform the binaraySearch is because:
# data size is too small. In this case, it only searches inside of one leaf, 
the maximum data size is 32 (default node size).
In theory, the performance of exponential search is {{[O(log( i ) + log( i 
))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is 
where the target is), vs. binary search is {{O(log( n ))}}.
Because exponentialSearch has 2 stages, it could do more comparisons when the 
{{n}} is small.
# Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations.

So for this JIRA, I don't think it worth doing exponentialSearch.


was (Author: jay.zhuang):
They're good feedbacks. I did the tests for expSearch:
!9988-test-result3.png|test!
I think the exponentialSearch didn't out perform the binaraySearch is because:
# data size is too small. In this case, it only searches inside of one leaf, 
the maximum data size is 32 (default node size).
In theroy, the performance of exponential search is {{[O(log( i ) + log( i 
))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is 
where the target is), vs. binary search is {{O(log( n ))}}.
Because exponentialSearch has 2 stages, it could do more compares when the 
{{n}} is small.
# Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations.

So for this JIRA, I don't think it worth doing exponentialSearch.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-24 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099288#comment-16099288
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 7/25/17 12:23 AM:
-

hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K 
(they're generated from random UUID.tostring()). In the graph above, they're 
corresponding to 4 data points in each bTree size section. We don't see a 
significant difference between them. For example {{IteratorTree}}, the first 4 
data points are the ones for one-node-BTree with different cellSize, they have 
the similar improvements (+800%), maybe here is better graph:
!9988-result3.png|result!


was (Author: jay.zhuang):
hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K 
(they're generated from random UUID.tostring()). In the graph above, they're 
corresponding to 4 data points in each bTree size section. We don't see a 
significant difference between them. For example {{IteratorTree}}, the first 4 
data points are the ones for one-node-BTree with different cellSize, they have 
the similar improvements (+800%), maybe here is better graph:
!9988-result2.png|result!

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-21 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 7/22/17 5:18 AM:


I added tests for different cell size and 10K/100K b-tree size:
| branch | utest |
|[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]|
|[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]||
If the data could fit into one leaf node (<=32), the performance is 
significantly improved. Otherwise, there's almost no performance difference, 
regardless of the cell size.
!9988-data.png|data!
!9988-result.png|result!


was (Author: jay.zhuang):
I added tests for different cell size and 10K/100K b-tree size:
| branch | utest |
|[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]|
|[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]||
If the data could fit into one leaf node (<=32), the performance is 
significantly improved. Otherwise, there's no performance difference, 
regardless of the cell size.
!9988-data.png|data!
!9988-result.png|result!

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-21 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 7/22/17 5:14 AM:


I added tests for different cell size and 10K/100K b-tree size:
| branch | utest |
|[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]|
|[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]||
If the data could fit into one leaf node (<=32), the performance is 
significantly improved. Otherwise, there's no performance difference, 
regardless of the cell size.
!9988-data.png|data!
!9988-result.png|result!


was (Author: jay.zhuang):
I added tests for different cell size and 10K and 100K b-tree size:
| branch | utest |
|[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]|
|[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]||
If the data could fit into one leaf node (<=32), the performance is 
significantly improved. Otherwise, there's no performance difference, 
regardless of the cell size.
!9988-data.png|data!
!9988-result.png|result!

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Anthony Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094089#comment-16094089
 ] 

Anthony Grasso edited comment on CASSANDRA-9988 at 7/20/17 5:37 AM:


[~jay.zhuang] two other things I noticed as far as the code goes
* The patch needs to be updated so that it applies cleanly to _trunk_.
* Some minor coding style updates are needed. I have left a comments on the 
[9988-trunk-onecommit-update2 | 
https://github.com/cooldoger/cassandra/commit/c5003812327d4475a7fb29f11686c13eeb50d693]
 branch.

Let me know if you need a hand with either of these. I am happy to make that 
change as well.


was (Author: anthony grasso):
[~jay.zhuang] one other thing I noticed is that the patch needs to be updated 
so that it applies cleanly to _trunk_. Let me know if you need a hand with 
that. I am happy to make that change as well.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Anthony Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092832#comment-16092832
 ] 

Anthony Grasso edited comment on CASSANDRA-9988 at 7/20/17 5:33 AM:


Have started reviewing the code.

Ran the partition unit tests on the branch
{noformat}
ant test 
-Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest
ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest
{noformat}

So far so good.


was (Author: anthony grasso):
Have reviewed the code.

Ran the Microbench tests
{noformat}
ant test -Dtest.name=org.apache.cassandra.test.microbench.CompactionBench
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.BTreeSearchIteratorBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.DirectorySizerBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadExecutor
ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadLocalBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.MutationBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.PendingRangesBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.ReadWriteTest
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.StreamingHistogramBench
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.PartitionImplementationTest
{noformat}

Ran the partition unit tests
{noformat}
ant test 
-Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest
ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest
{noformat}

Changes look good to me.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-04-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966767#comment-15966767
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 4/13/17 12:21 AM:
-

Rebased the 
[patch|https://github.com/cooldoger/cassandra/commit/c5003812327d4475a7fb29f11686c13eeb50d693.patch].
 Please review: 
[9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1]


was (Author: jay.zhuang):
Rebased the 
[patch|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1]
 on trunk. Please review: 
[9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1]

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-25 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838635#comment-15838635
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 1/25/17 9:42 PM:


Thanks Benedict for your quick response. I updated the 
[patch|https://github.com/cooldoger/cassandra/commit/d2e4621542149c68b23cb36d83289d92d722f9a0]
 with your comment.

[~jasobrown] would you please review the patch?


was (Author: jay.zhuang):
Thanks Benedict for your quick response. I updated the patch with your comment.

[~jasobrown] would you please review the patch?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-22 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833718#comment-15833718
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 1/22/17 10:06 PM:
-

[Patch|https://github.com/cooldoger/cassandra/commit/8704eb8bbf94d5b556f67386d464a015ca98554b]
 is ready, please review.

Here is the JMH test result before and after the patch, overall it 
significantly improved the iterator for small BTree:
|| 
[TestCode|https://github.com/cooldoger/cassandra/commit/5ae5397e1a4e2d8c5967cb847c2578df018e1def#diff-f687440fb97859f7be2813bf6182ba3eR89]
 || Before the patch || After the patch || Improvement ||
| iteratorBigTreeTest | 171.271 | 167.288 | -2% |
| searchBigTreeTest | 12743.251 | 11833.277 | -7% |
| searchNotFoundBigTreeTest | 11140.872 | 10378.794 | -7% |
| iteratorLeafTreeTest | 5337.291 | 19788.253 | +271% |
| searchFoundLeafTreeTest | 18608.504 | 36809.601 | +98% |
| searchNotFoundLeafTreeTest | 19670.935 | 28880.540 | +47% |
| iteratorOneElemTreeTest | 65983.946 | 414832.010 | +529% |
| searchFoundOneElemTreeTest | 45796.932 | 288174.238 | +529% |
| searchNotFoundOneElemTreeTest | 43176.354 | 284008.493 | +558% |

Here is the code for JMH test without fix: 
https://github.com/cooldoger/cassandra/tree/9988-nofix
I tried exponential search: 
https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b
But the JMH test result is worse (other tests didn't change too much):
||  || Binary search || Exponential search || Improvement ||
| searchFoundLeafTreeTest | 36809.601 | 30074.161 | -18% |
| searchNotFoundLeafTreeTest | 28880.540 | 24540.700 | -15% |
| searchFoundOneElemTreeTest | 288174.238 | 233786.942 | -19% |
| searchNotFoundOneElemTreeTest | 284008.493 | 199549.589 | -30% |

Here are the JMH test results:
{noformat}
No Fix
 [java] BenchmarkMode  Cnt  
Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
  171.271 ? 3.872  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 5337.291 ?   306.129  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
65983.946 ? 10651.607  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
12743.251 ?  1171.717  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt   40  
18608.504 ?  1671.129  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt   40  
45796.932 ?  6543.235  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest  thrpt   40  
11140.872 ?  1355.109  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt   40  
19670.935 ?  1775.676  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest  thrpt   40  
43176.354 ?  6862.149  ops/ms

Fix [Binary Search]
 [java] BenchmarkMode  Cnt  
 Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt   40  
   147.093 ? 7.968  ops/ms
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
   167.288 ? 4.029  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 19788.253 ?   887.529  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
414832.010 ? 16663.043  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
 11833.277 ?  1290.791  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt   40  
 36809.601 ?  2403.004  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt   40  
288174.238 ? 11618.352  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest  thrpt   40  
 10378.794 ?  1229.196  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt   40  
 28880.540 ?  1875.317  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest  thrpt   40  
284008.493 ?  7929.286  ops/ms

Fix [Exponential search]
 [java] BenchmarkMode  Cnt  
 Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt   40  
   164.849 ? 3.571  ops/ms
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
   161.228 ? 6.219  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 19828.490 ?   727.999  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
397640.373 ? 18525.227  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
 11816.892 ?  1305.738  ops/ms
 [java] 

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-16 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824837#comment-15824837
 ] 

Jay Zhuang edited comment on CASSANDRA-9988 at 1/17/17 1:56 AM:


Can I take this one? I'd like to work on this JIRA.


was (Author: jay.zhuang):
Can I take this one?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)