[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-11-11 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248740#comment-16248740
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Thanks [~jasobrown] for the review.
bq. just to confirm, {{FullBTreeSearchIterator}} is just a rename of the 
previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but 
I'd like the extra level of sanity checking.
Yes.
bq. {{BTreeSet}} switch imports back to original format
Updated.
bq. in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and 
{{#searchNext}} closer to the methods from where they are called. Or better 
yet, since each is only called from one place, just inline the functionality 
into those calling methods.
Updated. Removed {{#compareToFirst}} and moved {{#searchNext}}

For microbench, I was running the same microbench test with and without the 
patch. Having them in the same test would be easier to compare the test 
results. Here is updated branch:
| Branch | uTest |
| [9988-update|https://github.com/cooldoger/cassandra/tree/9988-update] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/9988-update.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9988-update]
 |

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, 
> 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, 
> 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-11-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848
 ] 

Jason Brown commented on CASSANDRA-9988:


Overall, I think the patch is pretty solid. A couple of comments and nits:

- is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now
- just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous 
{{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like 
the extra level of sanity checking.

nits:
- {{BTreeSet}} switch imports back to original format
- in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} 
closer to the methods from where they are called. Or better yet, since each is 
only called from one place, just inline the functionality into those calling 
methods.

My biggest problem is with the microbench. I know this ticket has gone through 
a bunch on contortions over it's life, so maybe a little confusion is 
reasonable. However, what I was really hoping to see was a comparison of the 
leaf vs full (tree) iterators, as that's what this whole ticket is about. I 
took liberty of reworking the mircobench to explicitly test the differences 
bewtween leaf and tree iterators, as well as reduce of benchmark executions 
(due to every value between 0 and 31, inclusive, being a param to the 
{{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we 
only really care about testing reading from the beginning, middle, and end of 
the btree entries; anything else just bloats total test execution time). 
Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the 
{{targetIdx}} param, I think they can be moved to another class as they would 
be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a 
way to ignore that in JMH).

My version of your branch with the updated microbench is 
[here|https://github.com/jasobrown/cassandra/tree/9988]

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, 
> 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, 
> 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-09 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120862#comment-16120862
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

[~Anthony Grasso] would you please final review the patch and commit it?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118100#comment-16118100
 ] 

Benedict commented on CASSANDRA-9988:
-

bq. The original implementation is valid. I think that's another example shows 
the code is not that easy to understand.

Mea culpa, although I think that's more an indictment of my hubris and our 
mutual disinterest.  If fifteen distracted minutes in a JIRA comment window 
procrastinating from real work yields a minor bug, it's probably not *that* 
complicated, and an IDE, test suite and a couple of hours would probably more 
than conquer it.

bq. The graph is the search performance (throughput) for searching target on 
the different positions.

Except that this isn't the same, because the search domain is different with 
different starting positions.  This is also over-counting unrelated test setup 
costs.  One would expect that, with a leaf size of 32, iterating the whole 
contents would be _approx._ 4x faster with exp. search than binary search; the 
fact it is only 3x is probably down to those extra costs.  The worst inversion 
of performance would be around 2x (i.e. exp. search being 2x slower than 
b.search at its worst), with logarithmically spaced items - i.e. at position 
16, 8, 4, 2, 1.  However, in the slowest case for exp. search, we are doing 
much less work than the slowest for binary search.

Of course, with a limit to 32 items the exp. search benefits are much less 
apparent.  If we had a leaf size of 64 it would become much more stark.  With a 
leaf size of 16, it would also become completely irrelevant.



> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-07 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117669#comment-16117669
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

I created CASSANDRA-13748 for expSearch follow up. Rebased the patch to the 
current trunk and squashed the changes:
| branch | uTest |
| [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | 
[circleci#74 passed|https://circleci.com/gh/cooldoger/cassandra/74] |

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-07 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117210#comment-16117210
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Hi [~benedict] thanks for the quick response.

The graph is the search performance (throughput) for searching target on the 
different positions. The X axis is the position, the Y axis is the throughput, 
the bigger is the better. For multi-seeks, we can assume the current position 
is 0, so if the next search value is the next adjacent one (position 1), 
expSearch is 3 times better than binSearch, but binSearchOptimized could 
actually do slightly better than expSearch in that case.

{quote}
if we want just a random 8 items in a 32 item iterator, we should expect exp. 
search to win on average
{quote}
Yes, I agree. expSearch should out perform binSearch if we choose random 8 
items out of 32, as these 8 items should be close to each other.

{quote}
I did indicate my changes assumed your implementation was valid (I did not 
attempt to verify it, and the original looks to be the source of the bug).
{quote}
The original implementation is valid. I think that's another example shows the 
code is not that easy to understand.
If you follows the original implementation, it should be: {{return 
Arrays.binarySearch(keys, lb, Math.min(ub + 1, probe_ub {color:red}+ 1{color}), 
key, comparator);}} baiscally it's just one-off issue.

{quote}
I will, separately, note that I suspect we're both well past interest in this 
discussion. So if you'd rather just go with binary search, feel free.
{quote}
Yes, I agree, and I'm really appreciated your reviews and comments. Let's 
create another JIRA to look into the adoption of expSearch, not only for this 
case, it may improve performance for other places too. and it should be in util/

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116363#comment-16116363
 ] 

Benedict commented on CASSANDRA-9988:
-

I will, separately, note that I suspect we're both well past interest in this 
discussion.  So if you'd rather just go with binary search, feel free.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116265#comment-16116265
 ] 

Benedict commented on CASSANDRA-9988:
-

bq. very bad average performance

Could you clarify exactly what the graph is showing?  Is it what we agreed to 
test, i.e. a full consumption of the iterator via calls to {{next()}} ?  
Because I'm pretty sure the "average performance" of binarysearch in this case 
is worse by a much larger margin.

We really need a range of data points to make a decision on, so we can put the 
whole thing to bed for good.

We can also (if we care) tune exp. search's variance by setting the first 
distance it jumps.  We currently jump to the immediately following item - if we 
jump 4 items first, we take between 3 and 7 comparisons, as opposed to between 
1 and 9.  However, there is advantage to higher variance, since the lower cost 
operations are going to be encountered more likely when we consume multiple 
items from the iterator. 

bq. optimized to search the first and second value

exp. search is optimised for any _sequence_ of values.  i.e., if we want just a 
random 8 items in a 32 item iterator, we should expect exp. search to win on 
average.  *If you would like, you could modify your test overall to simply 
search for a random number of items, randomly selected from the iterator's 
contents.*  This should favour exp. search, esp. as the number of random items 
increases.  In real systems, selections are more likely to bias towards 
adjacent items than purely random, so any such advantage will probably be 
understated.


bq. complicate code, hard to maintain

I did indicate my changes assumed your implementation was valid (I did not 
attempt to verify it, and the original looks to be the source of the bug).  
Since there was minimal effort on both our parts to just test the hypothesis, I 
don't think this should be a major concern.  This code snippet is trivial to 
add perfect test coverage for, so we could randomly generate code until it 
worked if we wanted.

bq. Dir.DESC will have ... performance degradation 

Clearly, the algorithm as stands would not be viable for DESC, which I did 
indicate in my prior comment.  I had hoped we could simply validate its 
behaviour and then decide if it is worth writing one that is agnostic to 
direction.  

bq. I'm not familiar with the whole cassandra code base, so not sure how often 
we search for the adjacent value

As previously indicated, I'm pretty sure we, for _every query_, consume 
iterators in a manner that touches adjacent (or nearly) items for most of an 
iterator's contents.  We do this for result row serialization and 
deserialization, and we do it for query column selection.  i.e., we do this a 
great deal.  That is, unless my recollection is faulty or the codebase has 
changed significantly.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-08-06 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116055#comment-16116055
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Here are the test results to search target on the different index with the 
following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the 
throughput. {{Dir.DESC}} will have the same performance degradation and we do 
use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above 
code, it should be {{while (prob_ub <= ub && (c = 
comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size 
{{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make 
the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and 
{{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we 
search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now 
for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | 
[9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| 
[9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073
 ] 

Benedict commented on CASSANDRA-9988:
-

Assuming your implementation is correct, here is a modified version that works
{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int bound = 0;
 int c = -1;
 while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
 bound = Math.max(bound * 2, 1);
 if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment ub
return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
 while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
 bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
 int lb = forwards ? nextPos : lowerBound; // inclusive
 int ub = forwards ? upperBound : nextPos; // inclusive
 int c = -1;
 int probe_ub = lb;
 int probe_incr = 1;
 while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
 lb = probe_ub;
 probe_ub += probe_incr;
 probe_incr *= 2;
 }
 if (c == 0) { return probe_ub; }
return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

better still
I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the exponential search as it stands here probably does not behave 
very well for reverse iteration.

Also, apologies for against-style-guide variable names.  But this comment box 
is too poor an editor to revise it any further.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-28 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105981#comment-16105981
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Hi [~benedict], would you please point me the exponential search implementation 
for the branch iterator?

I tried to improve the 
[expSearch|https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b],
 here is [the improved 
version|https://github.com/cooldoger/cassandra/commit/7330646e553afd4b4b0011f6163bf9032f9cb605]
 which will do 1 comparison if it finds the target at the first stage. But 
that's optimized to find the target on {{2^i}} position, doesn't cover position 
0. To cover position 0 (the first element), I need to add more checks and code, 
it makes the code hard to understand.
And if we really do that, I could have the same optimization for binarySearch 
(manually check the first 1 or 2 elements), which results in the similar 
improvement.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103033#comment-16103033
 ] 

Benedict commented on CASSANDRA-9988:
-

Well, it's been a while since I looked at the internals, but iirc there is an 
extremely common scenario in which we enumerate most of the contents of such an 
iterator via search, and that is in the case of selecting columns from a row.

Since we already have binary search's best case and exponential search's worst 
case in the benchmarks, and given we can expect the inverse scenario to be 
common, why not supplement with that so we can see the two extremes?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102704#comment-16102704
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Hi [~benedict], re-read [your comments 1.5 years' 
ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219],
 now I see your point:
{quote}
2. We could probably get uniformly better behaviour by implementing exponential 
search for the leaf iterator; this would bring our total number of comparisons 
down from n lg n to n when searching *an approximately equal set of items*, 
without harming our best case complexity.
{quote}
In that case, {{i}} would be very small ({{i}} is where the target is), so 
expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. 
binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how 
we should define the test: How many re-seeks should we do? How we choose next 
target? If we randomly select the next target, it's unfair to expSearch, if we 
always select next value (basically use search to iterator the tree) is unfair 
to binSearch. It really depends on the user scenario. Do you have any 
suggestion?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102593#comment-16102593
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

{quote}
However, these are iterators, and the expectation is that there will be 
multiple seeks during the iterator's lifetime. I had assumed that your 
searchFound and searchNotFound enumerated (searched for) a sequence of found / 
not found keys. There should probably be such variants.
{quote}
Okay, I can add more tests for that. But I don't think expSearch would make 
multiple-seeks within a leaf node better, as multiple seeks would make the 
{{n}} even smaller. And even in general, here is an article comparing 
binarySearch vs. exponentialSearch (galloping Search): [Beating the binary 
search algorithm – interpolation search, galloping 
search|http://blog.teamleadnet.com/2014/06/beating-binary-search-algorithm.html]
 Gallop(exponentialSearch) is actually not as good as JAVA default API 
{{Arrays.binarySearch}}:!http://3.bp.blogspot.com/-mPLIu6llPBY/U5JoP-6xivI/BMU/D9N5mhLNbOk/s1600/SearchTime.png!

For consistency concern, binarySearch() is wildly used in BTree code:
[BTree.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java],
 
[NodeCursor.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeCursor.java],
 
[TreeCursor.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/TreeCursor.java],
 
[NodeBuilder.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java],
 
[BTreeRemoval.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTreeRemoval.java]

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102503#comment-16102503
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

{quote}
Also, what exactly are these graphs showing us?
{quote}
It's the throughput improvements before and after the change ({{($before - 
$after)/$before}}), for example for {{SearchFound}} test, when the treeSize is 
1, it has negative performance impact about {{-18%}}, when the treeSize is 16 
or 32, it is about -65%. I attached the excel files with raw data to the 
ticket.[^9988-test-result.xlsx][^9988-test-result-expsearch.xlsx]



> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102361#comment-16102361
 ] 

Benedict commented on CASSANDRA-9988:
-

Also, what exactly are these graphs showing us?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102358#comment-16102358
 ] 

Benedict commented on CASSANDRA-9988:
-

So, I took a quick peek at your actual test cases, and it seems you're only 
ever seeking into an iterator the once with a uniform key across its domain, 
then discarding it.  In this scenario, yes, you would expect exponential search 
to be more costly.

However, these are iterators, and the expectation is that there will be 
multiple seeks during the iterator's lifetime.  I had assumed that your 
searchFound and searchNotFound enumerated (searched for) a sequence of found / 
not found keys.  There should probably be such variants.

All that said, I feel like this ticket has dragged on long enough and we're 
probably well past diminishing returns for the change itself.  But these 
benchmarks are likely to persist a lot better than this discussion, so it is 
probably worthwhile getting them right.  It's also the case that we already use 
exponential search for the branch iterator (though this was decided back on 2.2 
storage format, so our N was much larger), and we should probably be consistent 
and settle it once and for all.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102332#comment-16102332
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

They're good feedbacks. I did the tests for expSearch:
!9988-test-result3.png|test!
I think the exponentialSearch didn't out perform the binaraySearch is because:
# data size is too small. In this case, it only searches inside of one leaf, 
the maximum data size is 32 (default node size).
In theroy, the performance of exponential search is {{[O(log( i ) + log( i 
))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is 
where the target is), vs. binary search is {{O(log( n ))}}.
Because exponentialSearch has 2 stages, it could do more compares when the 
{{n}} is small.
# Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations.

So for this JIRA, I don't think it worth doing exponentialSearch.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101956#comment-16101956
 ] 

Benedict commented on CASSANDRA-9988:
-

Right. 

That, and my allusion to a "representative complexity" of object are meant to 
account for the fact that in C* proper we can expect comparisons to:
# miss L1/2 cache (and on first access all layers of cache; I don't know how 
many times we iterate objects, though; almost certainly more than once)
# be costly, as
## we will have many different comparators and object types to compare (so 
likely megamorphic call sites and no inlining)
## many of the object will be heavyweight
## neighbouring objects are likely to have lengthy common prefixes, so 
comparisons cannot abort early

Since exponential search's main benefit is reducing the number of comparisons, 
each of these mischaracterisations is likely to underrepresent its potential in 
the wild.

_Ideally_ we should be mitigating these confounders, else our decisions will 
potentially be inaccurate.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-26 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101919#comment-16101919
 ] 

Ariel Weisberg commented on CASSANDRA-9988:
---

I think Benedict was looking for a comparison of exponential search to binary 
search with a large number of small trees. The comparison you have is with many 
trees is with and without the optimization using binary search?

Since you already did the work of implementing exponential search it seems like 
it's worth hooking it up and testing btree size from 1-32 just to see so we 
don't have to revisit it later to decide if exponential search is a good idea.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-24 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099288#comment-16099288
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K 
(they're generated from random UUID.tostring()). In the graph above, they're 
corresponding to 4 data points in each bTree size section. We don't see a 
significant difference between them. For example {{IteratorTree}}, the first 4 
data points are the ones for one-node-BTree with different cellSize, they have 
the similar improvements (+800%), maybe here is better graph:
!9988-result2.png|result!

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result2.png, 9988-result.png, 
> 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-22 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097163#comment-16097163
 ] 

Benedict commented on CASSANDRA-9988:
-

So, for the record, I wasn't suggesting a huge tree (since these should by 
definition be unaffected by the patch, as they shouldn't ever use the leaf 
iterator), but a larger _corpora_ of trees so it cannot be guaranteed the trees 
and/or their datums are in at least L1/L2 cache (and perhaps, for comparison, 
L3+).


> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-21 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

I added tests for different cell size and 10K and 100K b-tree size:
| branch | utest |
|[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]|
|[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]||
If the data could fit into one leaf node (<=32), the performance is 
significantly improved. Otherwise, there's no performance difference, 
regardless of the cell size.
!9988-data.png|data!
!9988-result.png|result!

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-21 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095840#comment-16095840
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

[~Anthony Grasso] Thank you for the review. I updated branch 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk], will share 
the test result later.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Anthony Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094089#comment-16094089
 ] 

Anthony Grasso commented on CASSANDRA-9988:
---

[~jay.zhuang] one other thing I noticed is that the patch needs to be updated 
so that it applies cleanly to _trunk_. Let me know if you need a hand with 
that. I am happy to make that change as well.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Anthony Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094069#comment-16094069
 ] 

Anthony Grasso commented on CASSANDRA-9988:
---

[~benedict], that is a very good point about falling out of the CPU cache. As 
you suggest, we should modify the generation of the objects such that they 
include at the very least a UUID sequence and possibly a string sequence. In 
addition, we should add a fourth B-Tree test; {{btreeExtraLarge}} which has 
100k elements in it.

Before committing, it would be good to make the above changes and then rerun 
the JMH benchmarks again. [~jay.zhuang] if you are busy, I am happy to make the 
changes and re-run the benchmarks.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092846#comment-16092846
 ] 

Benedict commented on CASSANDRA-9988:
-

Before committing, you may want to consider modifying the microbenchmark to use 
objects that are of a representative complexity, and over a large enough domain 
that the contents will not all be in the processor cache.  This might affect 
the choice of exponential search vs. binary search, as the extra comparisons 
needed are not currently representatively counted.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-07-19 Thread Anthony Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092832#comment-16092832
 ] 

Anthony Grasso commented on CASSANDRA-9988:
---

Have reviewed the code.

Ran the Microbench tests
{noformat}
ant test -Dtest.name=org.apache.cassandra.test.microbench.CompactionBench
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.BTreeSearchIteratorBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.DirectorySizerBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadExecutor
ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadLocalBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.MutationBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.PendingRangesBench
ant test -Dtest.name=org.apache.cassandra.test.microbench.ReadWriteTest
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.StreamingHistogramBench
ant test 
-Dtest.name=org.apache.cassandra.test.microbench.PartitionImplementationTest
{noformat}

Ran the partition unit tests
{noformat}
ant test 
-Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest
ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest
{noformat}

Changes look good to me.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-06-21 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058529#comment-16058529
 ] 

Nate McCall commented on CASSANDRA-9988:


We have a few spare cycles coming up. Assigning to [~Anthony Grasso] for review 
and verification.  

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Anthony Grasso
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-04-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966767#comment-15966767
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Rebased the 
[patch|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1]
 on trunk. Please review: 
[9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1]

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-25 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838635#comment-15838635
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Thanks Benedict for your quick response. I updated the patch with your comment.

[~jasobrown] would you please review the patch?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838313#comment-15838313
 ] 

Benedict commented on CASSANDRA-9988:
-

Sorry Jay, not anytime soon.  I no longer work actively on the project and am 
rather busy; you should look for one of the full-time contributors to lend a 
hand getting this merged.

I took a quick peek, and I'll note (without commenting on the quality of the 
code itself) for any potential reviewers out there that the patch looks to be 
in a good shape generally, so I shouldn't expect it to be an onerous review.

I did notice that in LeafBtree... line 108-109 are probably redundant, assuming 
line 106 does its job

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-25 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838272#comment-15838272
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

[~benedict] would you please review the patch?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-trunk-new.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-22 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833718#comment-15833718
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

[Patch|https://github.com/cooldoger/cassandra/commit/8704eb8bbf94d5b556f67386d464a015ca98554b]
 is ready, please review.

Here is the JMH test result before and after the patch, overall it 
significantly improved the iterator for small BTree:
||  || Before the patch || After the patch || Improvement ||
| iteratorBigTreeTest | 171.271 | 167.288 | -2% |
| searchBigTreeTest | 12743.251 | 11833.277 | -7% |
| searchNotFoundBigTreeTest | 11140.872 | 10378.794 | -7% |
| iteratorLeafTreeTest | 5337.291 | 19788.253 | +271% |
| searchFoundLeafTreeTest | 18608.504 | 36809.601 | +98% |
| searchNotFoundLeafTreeTest | 19670.935 | 28880.540 | +47% |
| iteratorOneElemTreeTest | 65983.946 | 414832.010 | +529% |
| searchFoundOneElemTreeTest | 45796.932 | 288174.238 | +529% |
| searchNotFoundOneElemTreeTest | 43176.354 | 284008.493 | +558% |

Here is the code for JMH test without fix: 
https://github.com/cooldoger/cassandra/tree/9988-nofix
I tried exponential search: 
https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b
But the JMH test result is worse (other tests didn't change too much):
||  || Binary search || Exponential search || Improvement ||
| searchFoundLeafTreeTest | 36809.601 | 30074.161 | -18% |
| searchNotFoundLeafTreeTest | 28880.540 | 24540.700 | -15% |
| searchFoundOneElemTreeTest | 288174.238 | 233786.942 | -19% |
| searchNotFoundOneElemTreeTest | 284008.493 | 199549.589 | -30% |

Here are the JMH test results:
{noformat}
No Fix
 [java] BenchmarkMode  Cnt  
Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
  171.271 ? 3.872  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 5337.291 ?   306.129  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
65983.946 ? 10651.607  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
12743.251 ?  1171.717  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt   40  
18608.504 ?  1671.129  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt   40  
45796.932 ?  6543.235  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest  thrpt   40  
11140.872 ?  1355.109  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt   40  
19670.935 ?  1775.676  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest  thrpt   40  
43176.354 ?  6862.149  ops/ms

Fix [Binary Search]
 [java] BenchmarkMode  Cnt  
 Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt   40  
   147.093 ? 7.968  ops/ms
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
   167.288 ? 4.029  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 19788.253 ?   887.529  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
414832.010 ? 16663.043  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
 11833.277 ?  1290.791  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt   40  
 36809.601 ?  2403.004  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt   40  
288174.238 ? 11618.352  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest  thrpt   40  
 10378.794 ?  1229.196  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt   40  
 28880.540 ?  1875.317  ops/ms
 [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest  thrpt   40  
284008.493 ?  7929.286  ops/ms

Fix [Exponential search]
 [java] BenchmarkMode  Cnt  
 Score   Error   Units
 [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt   40  
   164.849 ? 3.571  ops/ms
 [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt   40  
   161.228 ? 6.219  ops/ms
 [java] BTreeSearchIteratorBench.iteratorLeafTreeTest   thrpt   40  
 19828.490 ?   727.999  ops/ms
 [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt   40  
397640.373 ? 18525.227  ops/ms
 [java] BTreeSearchIteratorBench.searchBigTreeTest  thrpt   40  
 11816.892 ?  1305.738  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt   40  
 30074.161 ?  1916.827  ops/ms
 [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt   40  
233786.942 ?  

[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-18 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828506#comment-15828506
 ] 

Jason Brown commented on CASSANDRA-9988:


[~jay.zhuang] it's all yours :)

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2017-01-16 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824837#comment-15824837
 ] 

Jay Zhuang commented on CASSANDRA-9988:
---

Can I take this one?

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2016-01-27 Thread Piotr Jastrzebski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118944#comment-15118944
 ] 

Piotr Jastrzebski commented on CASSANDRA-9988:
--

That's very interesting. Thanks for the article :). 

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2016-01-27 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118935#comment-15118935
 ] 

Sam Tunnicliffe commented on CASSANDRA-9988:


[~haaawk] see 
http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html
 for example

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2016-01-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118947#comment-15118947
 ] 

Benedict commented on CASSANDRA-9988:
-

FTR, the second half of my point 3 is very weak and only really likely to 
measurably affect icache occupancy, which is unlikely to be elicited in a 
microbenchmark, so feel free to ignore that.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2016-01-26 Thread Piotr Jastrzebski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117922#comment-15117922
 ] 

Piotr Jastrzebski commented on CASSANDRA-9988:
--

What do you mean by efficient method dispatch? Could you please explain me why 
having two implementations of an interface will result in faster method 
invocations (if that's what you have in mind) than when having three 
implementations? I know that implementing too many interfaces can have negative 
impact on performance due to collisions in method offsets but I never heard 
that number of implementations has any performance impact.

> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

2016-01-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117219#comment-15117219
 ] 

Benedict commented on CASSANDRA-9988:
-

Hi Piotr,

Thanks for your patch.  A few comments:

# It would be great to have just one Leaf iterator, so that we can benefit from 
efficient method despatch, as there will be only two search iterator 
implementations
# We could probably get uniformly better behaviour by implementing [exponential 
search|https://en.wikipedia.org/wiki/Exponential_search] for the leaf iterator; 
this would bring our total number of comparisons down from {{n lg n}} to {{n}} 
when searching an approximately equal set of items, without harming our best 
case complexity.
# It would be preferable to get numbers using JMH, and comparing a codebase 
that only contains the full search iterator, versus one that contains both (and 
performs searches that cross the boundaries, so it's unknown which will be 
returned), as method despatch will potentially be harmed


> Introduce leaf-only iterator
> 
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
>  Labels: patch
> Fix For: 3.x
>
> Attachments: trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)