[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248740#comment-16248740 ] Jay Zhuang commented on CASSANDRA-9988: --- Thanks [~jasobrown] for the review. bq. just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. Yes. bq. {{BTreeSet}} switch imports back to original format Updated. bq. in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. Updated. Removed {{#compareToFirst}} and moved {{#searchNext}} For microbench, I was running the same microbench test with and without the patch. Having them in the same test would be easier to compare the test results. Here is updated branch: | Branch | uTest | | [9988-update|https://github.com/cooldoger/cassandra/tree/9988-update] | [!https://circleci.com/gh/cooldoger/cassandra/tree/9988-update.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9988-update] | > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, > 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, > 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848 ] Jason Brown commented on CASSANDRA-9988: Overall, I think the patch is pretty solid. A couple of comments and nits: - is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now - just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. nits: - {{BTreeSet}} switch imports back to original format - in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. My biggest problem is with the microbench. I know this ticket has gone through a bunch on contortions over it's life, so maybe a little confusion is reasonable. However, what I was really hoping to see was a comparison of the leaf vs full (tree) iterators, as that's what this whole ticket is about. I took liberty of reworking the mircobench to explicitly test the differences bewtween leaf and tree iterators, as well as reduce of benchmark executions (due to every value between 0 and 31, inclusive, being a param to the {{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we only really care about testing reading from the beginning, middle, and end of the btree entries; anything else just bloats total test execution time). Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the {{targetIdx}} param, I think they can be moved to another class as they would be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a way to ignore that in JMH). My version of your branch with the updated microbench is [here|https://github.com/jasobrown/cassandra/tree/9988] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, > 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, > 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120862#comment-16120862 ] Jay Zhuang commented on CASSANDRA-9988: --- [~Anthony Grasso] would you please final review the patch and commit it? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118100#comment-16118100 ] Benedict commented on CASSANDRA-9988: - bq. The original implementation is valid. I think that's another example shows the code is not that easy to understand. Mea culpa, although I think that's more an indictment of my hubris and our mutual disinterest. If fifteen distracted minutes in a JIRA comment window procrastinating from real work yields a minor bug, it's probably not *that* complicated, and an IDE, test suite and a couple of hours would probably more than conquer it. bq. The graph is the search performance (throughput) for searching target on the different positions. Except that this isn't the same, because the search domain is different with different starting positions. This is also over-counting unrelated test setup costs. One would expect that, with a leaf size of 32, iterating the whole contents would be _approx._ 4x faster with exp. search than binary search; the fact it is only 3x is probably down to those extra costs. The worst inversion of performance would be around 2x (i.e. exp. search being 2x slower than b.search at its worst), with logarithmically spaced items - i.e. at position 16, 8, 4, 2, 1. However, in the slowest case for exp. search, we are doing much less work than the slowest for binary search. Of course, with a limit to 32 items the exp. search benefits are much less apparent. If we had a leaf size of 64 it would become much more stark. With a leaf size of 16, it would also become completely irrelevant. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117669#comment-16117669 ] Jay Zhuang commented on CASSANDRA-9988: --- I created CASSANDRA-13748 for expSearch follow up. Rebased the patch to the current trunk and squashed the changes: | branch | uTest | | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | [circleci#74 passed|https://circleci.com/gh/cooldoger/cassandra/74] | > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117210#comment-16117210 ] Jay Zhuang commented on CASSANDRA-9988: --- Hi [~benedict] thanks for the quick response. The graph is the search performance (throughput) for searching target on the different positions. The X axis is the position, the Y axis is the throughput, the bigger is the better. For multi-seeks, we can assume the current position is 0, so if the next search value is the next adjacent one (position 1), expSearch is 3 times better than binSearch, but binSearchOptimized could actually do slightly better than expSearch in that case. {quote} if we want just a random 8 items in a 32 item iterator, we should expect exp. search to win on average {quote} Yes, I agree. expSearch should out perform binSearch if we choose random 8 items out of 32, as these 8 items should be close to each other. {quote} I did indicate my changes assumed your implementation was valid (I did not attempt to verify it, and the original looks to be the source of the bug). {quote} The original implementation is valid. I think that's another example shows the code is not that easy to understand. If you follows the original implementation, it should be: {{return Arrays.binarySearch(keys, lb, Math.min(ub + 1, probe_ub {color:red}+ 1{color}), key, comparator);}} baiscally it's just one-off issue. {quote} I will, separately, note that I suspect we're both well past interest in this discussion. So if you'd rather just go with binary search, feel free. {quote} Yes, I agree, and I'm really appreciated your reviews and comments. Let's create another JIRA to look into the adoption of expSearch, not only for this case, it may improve performance for other places too. and it should be in util/ > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116363#comment-16116363 ] Benedict commented on CASSANDRA-9988: - I will, separately, note that I suspect we're both well past interest in this discussion. So if you'd rather just go with binary search, feel free. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116265#comment-16116265 ] Benedict commented on CASSANDRA-9988: - bq. very bad average performance Could you clarify exactly what the graph is showing? Is it what we agreed to test, i.e. a full consumption of the iterator via calls to {{next()}} ? Because I'm pretty sure the "average performance" of binarysearch in this case is worse by a much larger margin. We really need a range of data points to make a decision on, so we can put the whole thing to bed for good. We can also (if we care) tune exp. search's variance by setting the first distance it jumps. We currently jump to the immediately following item - if we jump 4 items first, we take between 3 and 7 comparisons, as opposed to between 1 and 9. However, there is advantage to higher variance, since the lower cost operations are going to be encountered more likely when we consume multiple items from the iterator. bq. optimized to search the first and second value exp. search is optimised for any _sequence_ of values. i.e., if we want just a random 8 items in a 32 item iterator, we should expect exp. search to win on average. *If you would like, you could modify your test overall to simply search for a random number of items, randomly selected from the iterator's contents.* This should favour exp. search, esp. as the number of random items increases. In real systems, selections are more likely to bias towards adjacent items than purely random, so any such advantage will probably be understated. bq. complicate code, hard to maintain I did indicate my changes assumed your implementation was valid (I did not attempt to verify it, and the original looks to be the source of the bug). Since there was minimal effort on both our parts to just test the hypothesis, I don't think this should be a major concern. This code snippet is trivial to add perfect test coverage for, so we could randomly generate code until it worked if we wanted. bq. Dir.DESC will have ... performance degradation Clearly, the algorithm as stands would not be viable for DESC, which I did indicate in my prior comment. I had hoped we could simply validate its behaviour and then decide if it is worth writing one that is agnostic to direction. bq. I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value As previously indicated, I'm pretty sure we, for _every query_, consume iterators in a manner that touches adjacent (or nearly) items for most of an iterator's contents. We do this for result row serialization and deserialization, and we do it for query column selection. i.e., we do this a great deal. That is, unless my recollection is faulty or the codebase has changed significantly. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116055#comment-16116055 ] Jay Zhuang commented on CASSANDRA-9988: --- Here are the test results to search target on the different index with the following search algorithms: 1. binary search 2. exponential search 3. binary search optimized for next adjacent value !9988-3tests.png|tests! Here are my takes from the results: h4. 1. binary search: Pros: * best on average and evenly distributed; * Simple code, use JAVA native API. Cons: * not optimized to search for the adjacent value. h4. 2. exponential search: Pros: * optimized to search the first and second value for {{Dir.ASC}}. Cons: * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places; * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out. I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated. h4. 3. binary search optimized for next adjacent value Pros: * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}} Cons: * Average performance is a litter bit worse than binarySearch I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch: * consistency with other cassandra code, plus it's the current behave right now for btree search; * smaller mean deviations, could be good for tp99; * clean code. But I'm fine with any solutions: | 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | | 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] | | 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] | > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073 ] Benedict commented on CASSANDRA-9988: - Assuming your implementation is correct, here is a modified version that works {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int bound = 0; int c = -1; while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key)) < 0) bound = Math.max(bound * 2, 1); if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include it in bsearch, so we do not need to increment ub return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub), key, comparator); {code} This implementation as a whole could be cleaned up a little, and there are a variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear. A "faster" variant would be: {code} while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound / 2)], key)) < 0) bound *= 2; {code} A faster-still variant would just offset lb by -1 at the start, so there are no recurring costs, and restore it after the loop. But better than this would be something like: {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int c = -1; int probe_ub = lb; int probe_incr = 1; while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) < 0) { lb = probe_ub; probe_ub += probe_incr; probe_incr *= 2; } if (c == 0) { return probe_ub; } return Arrays.binarySearch(keys, lb, probe_ub, key, comparator); {code} better still I haven't looked at the codebase in a long time, and don't have it checked out, so haven't consulted the branch iterator version, and don't actually recall precisely what algorithm we used. I don't recall it being a particularly elegant approach, though. In fact, it's possible it only compares the first element and then resorts to binary search; I do not recall for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it used to be called). I will note the exponential search as it stands here probably does not behave very well for reverse iteration. Also, apologies for against-style-guide variable names. But this comment box is too poor an editor to revise it any further. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105981#comment-16105981 ] Jay Zhuang commented on CASSANDRA-9988: --- Hi [~benedict], would you please point me the exponential search implementation for the branch iterator? I tried to improve the [expSearch|https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b], here is [the improved version|https://github.com/cooldoger/cassandra/commit/7330646e553afd4b4b0011f6163bf9032f9cb605] which will do 1 comparison if it finds the target at the first stage. But that's optimized to find the target on {{2^i}} position, doesn't cover position 0. To cover position 0 (the first element), I need to add more checks and code, it makes the code hard to understand. And if we really do that, I could have the same optimization for binarySearch (manually check the first 1 or 2 elements), which results in the similar improvement. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103033#comment-16103033 ] Benedict commented on CASSANDRA-9988: - Well, it's been a while since I looked at the internals, but iirc there is an extremely common scenario in which we enumerate most of the contents of such an iterator via search, and that is in the case of selecting columns from a row. Since we already have binary search's best case and exponential search's worst case in the benchmarks, and given we can expect the inverse scenario to be common, why not supplement with that so we can see the two extremes? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102704#comment-16102704 ] Jay Zhuang commented on CASSANDRA-9988: --- Hi [~benedict], re-read [your comments 1.5 years' ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219], now I see your point: {quote} 2. We could probably get uniformly better behaviour by implementing exponential search for the leaf iterator; this would bring our total number of comparisons down from n lg n to n when searching *an approximately equal set of items*, without harming our best case complexity. {quote} In that case, {{i}} would be very small ({{i}} is where the target is), so expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how we should define the test: How many re-seeks should we do? How we choose next target? If we randomly select the next target, it's unfair to expSearch, if we always select next value (basically use search to iterator the tree) is unfair to binSearch. It really depends on the user scenario. Do you have any suggestion? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102593#comment-16102593 ] Jay Zhuang commented on CASSANDRA-9988: --- {quote} However, these are iterators, and the expectation is that there will be multiple seeks during the iterator's lifetime. I had assumed that your searchFound and searchNotFound enumerated (searched for) a sequence of found / not found keys. There should probably be such variants. {quote} Okay, I can add more tests for that. But I don't think expSearch would make multiple-seeks within a leaf node better, as multiple seeks would make the {{n}} even smaller. And even in general, here is an article comparing binarySearch vs. exponentialSearch (galloping Search): [Beating the binary search algorithm – interpolation search, galloping search|http://blog.teamleadnet.com/2014/06/beating-binary-search-algorithm.html] Gallop(exponentialSearch) is actually not as good as JAVA default API {{Arrays.binarySearch}}:!http://3.bp.blogspot.com/-mPLIu6llPBY/U5JoP-6xivI/BMU/D9N5mhLNbOk/s1600/SearchTime.png! For consistency concern, binarySearch() is wildly used in BTree code: [BTree.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java], [NodeCursor.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeCursor.java], [TreeCursor.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/TreeCursor.java], [NodeBuilder.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java], [BTreeRemoval.java|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTreeRemoval.java] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102503#comment-16102503 ] Jay Zhuang commented on CASSANDRA-9988: --- {quote} Also, what exactly are these graphs showing us? {quote} It's the throughput improvements before and after the change ({{($before - $after)/$before}}), for example for {{SearchFound}} test, when the treeSize is 1, it has negative performance impact about {{-18%}}, when the treeSize is 16 or 32, it is about -65%. I attached the excel files with raw data to the ticket.[^9988-test-result.xlsx][^9988-test-result-expsearch.xlsx] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102361#comment-16102361 ] Benedict commented on CASSANDRA-9988: - Also, what exactly are these graphs showing us? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102358#comment-16102358 ] Benedict commented on CASSANDRA-9988: - So, I took a quick peek at your actual test cases, and it seems you're only ever seeking into an iterator the once with a uniform key across its domain, then discarding it. In this scenario, yes, you would expect exponential search to be more costly. However, these are iterators, and the expectation is that there will be multiple seeks during the iterator's lifetime. I had assumed that your searchFound and searchNotFound enumerated (searched for) a sequence of found / not found keys. There should probably be such variants. All that said, I feel like this ticket has dragged on long enough and we're probably well past diminishing returns for the change itself. But these benchmarks are likely to persist a lot better than this discussion, so it is probably worthwhile getting them right. It's also the case that we already use exponential search for the branch iterator (though this was decided back on 2.2 storage format, so our N was much larger), and we should probably be consistent and settle it once and for all. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102332#comment-16102332 ] Jay Zhuang commented on CASSANDRA-9988: --- They're good feedbacks. I did the tests for expSearch: !9988-test-result3.png|test! I think the exponentialSearch didn't out perform the binaraySearch is because: # data size is too small. In this case, it only searches inside of one leaf, the maximum data size is 32 (default node size). In theroy, the performance of exponential search is {{[O(log( i ) + log( i ))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is where the target is), vs. binary search is {{O(log( n ))}}. Because exponentialSearch has 2 stages, it could do more compares when the {{n}} is small. # Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations. So for this JIRA, I don't think it worth doing exponentialSearch. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101956#comment-16101956 ] Benedict commented on CASSANDRA-9988: - Right. That, and my allusion to a "representative complexity" of object are meant to account for the fact that in C* proper we can expect comparisons to: # miss L1/2 cache (and on first access all layers of cache; I don't know how many times we iterate objects, though; almost certainly more than once) # be costly, as ## we will have many different comparators and object types to compare (so likely megamorphic call sites and no inlining) ## many of the object will be heavyweight ## neighbouring objects are likely to have lengthy common prefixes, so comparisons cannot abort early Since exponential search's main benefit is reducing the number of comparisons, each of these mischaracterisations is likely to underrepresent its potential in the wild. _Ideally_ we should be mitigating these confounders, else our decisions will potentially be inaccurate. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101919#comment-16101919 ] Ariel Weisberg commented on CASSANDRA-9988: --- I think Benedict was looking for a comparison of exponential search to binary search with a large number of small trees. The comparison you have is with many trees is with and without the optimization using binary search? Since you already did the work of implementing exponential search it seems like it's worth hooking it up and testing btree size from 1-32 just to see so we don't have to revisit it later to decide if exponential search is a good idea. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099288#comment-16099288 ] Jay Zhuang commented on CASSANDRA-9988: --- hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K (they're generated from random UUID.tostring()). In the graph above, they're corresponding to 4 data points in each bTree size section. We don't see a significant difference between them. For example {{IteratorTree}}, the first 4 data points are the ones for one-node-BTree with different cellSize, they have the similar improvements (+800%), maybe here is better graph: !9988-result2.png|result! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result.png, > 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097163#comment-16097163 ] Benedict commented on CASSANDRA-9988: - So, for the record, I wasn't suggesting a huge tree (since these should by definition be unaffected by the patch, as they shouldn't ever use the leaf iterator), but a larger _corpora_ of trees so it cannot be guaranteed the trees and/or their datums are in at least L1/L2 cache (and perhaps, for comparison, L3+). > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112 ] Jay Zhuang commented on CASSANDRA-9988: --- I added tests for different cell size and 10K and 100K b-tree size: | branch | utest | |[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]| |[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]|| If the data could fit into one leaf node (<=32), the performance is significantly improved. Otherwise, there's no performance difference, regardless of the cell size. !9988-data.png|data! !9988-result.png|result! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095840#comment-16095840 ] Jay Zhuang commented on CASSANDRA-9988: --- [~Anthony Grasso] Thank you for the review. I updated branch [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk], will share the test result later. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094089#comment-16094089 ] Anthony Grasso commented on CASSANDRA-9988: --- [~jay.zhuang] one other thing I noticed is that the patch needs to be updated so that it applies cleanly to _trunk_. Let me know if you need a hand with that. I am happy to make that change as well. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094069#comment-16094069 ] Anthony Grasso commented on CASSANDRA-9988: --- [~benedict], that is a very good point about falling out of the CPU cache. As you suggest, we should modify the generation of the objects such that they include at the very least a UUID sequence and possibly a string sequence. In addition, we should add a fourth B-Tree test; {{btreeExtraLarge}} which has 100k elements in it. Before committing, it would be good to make the above changes and then rerun the JMH benchmarks again. [~jay.zhuang] if you are busy, I am happy to make the changes and re-run the benchmarks. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092846#comment-16092846 ] Benedict commented on CASSANDRA-9988: - Before committing, you may want to consider modifying the microbenchmark to use objects that are of a representative complexity, and over a large enough domain that the contents will not all be in the processor cache. This might affect the choice of exponential search vs. binary search, as the extra comparisons needed are not currently representatively counted. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092832#comment-16092832 ] Anthony Grasso commented on CASSANDRA-9988: --- Have reviewed the code. Ran the Microbench tests {noformat} ant test -Dtest.name=org.apache.cassandra.test.microbench.CompactionBench ant test -Dtest.name=org.apache.cassandra.test.microbench.BTreeSearchIteratorBench ant test -Dtest.name=org.apache.cassandra.test.microbench.DirectorySizerBench ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadExecutor ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadLocalBench ant test -Dtest.name=org.apache.cassandra.test.microbench.MutationBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.PendingRangesBench ant test -Dtest.name=org.apache.cassandra.test.microbench.ReadWriteTest ant test -Dtest.name=org.apache.cassandra.test.microbench.StreamingHistogramBench ant test -Dtest.name=org.apache.cassandra.test.microbench.PartitionImplementationTest {noformat} Ran the partition unit tests {noformat} ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest {noformat} Changes look good to me. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058529#comment-16058529 ] Nate McCall commented on CASSANDRA-9988: We have a few spare cycles coming up. Assigning to [~Anthony Grasso] for review and verification. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Anthony Grasso >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966767#comment-15966767 ] Jay Zhuang commented on CASSANDRA-9988: --- Rebased the [patch|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1] on trunk. Please review: [9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838635#comment-15838635 ] Jay Zhuang commented on CASSANDRA-9988: --- Thanks Benedict for your quick response. I updated the patch with your comment. [~jasobrown] would you please review the patch? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838313#comment-15838313 ] Benedict commented on CASSANDRA-9988: - Sorry Jay, not anytime soon. I no longer work actively on the project and am rather busy; you should look for one of the full-time contributors to lend a hand getting this merged. I took a quick peek, and I'll note (without commenting on the quality of the code itself) for any potential reviewers out there that the patch looks to be in a good shape generally, so I shouldn't expect it to be an onerous review. I did notice that in LeafBtree... line 108-109 are probably redundant, assuming line 106 does its job > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838272#comment-15838272 ] Jay Zhuang commented on CASSANDRA-9988: --- [~benedict] would you please review the patch? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833718#comment-15833718 ] Jay Zhuang commented on CASSANDRA-9988: --- [Patch|https://github.com/cooldoger/cassandra/commit/8704eb8bbf94d5b556f67386d464a015ca98554b] is ready, please review. Here is the JMH test result before and after the patch, overall it significantly improved the iterator for small BTree: || || Before the patch || After the patch || Improvement || | iteratorBigTreeTest | 171.271 | 167.288 | -2% | | searchBigTreeTest | 12743.251 | 11833.277 | -7% | | searchNotFoundBigTreeTest | 11140.872 | 10378.794 | -7% | | iteratorLeafTreeTest | 5337.291 | 19788.253 | +271% | | searchFoundLeafTreeTest | 18608.504 | 36809.601 | +98% | | searchNotFoundLeafTreeTest | 19670.935 | 28880.540 | +47% | | iteratorOneElemTreeTest | 65983.946 | 414832.010 | +529% | | searchFoundOneElemTreeTest | 45796.932 | 288174.238 | +529% | | searchNotFoundOneElemTreeTest | 43176.354 | 284008.493 | +558% | Here is the code for JMH test without fix: https://github.com/cooldoger/cassandra/tree/9988-nofix I tried exponential search: https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b But the JMH test result is worse (other tests didn't change too much): || || Binary search || Exponential search || Improvement || | searchFoundLeafTreeTest | 36809.601 | 30074.161 | -18% | | searchNotFoundLeafTreeTest | 28880.540 | 24540.700 | -15% | | searchFoundOneElemTreeTest | 288174.238 | 233786.942 | -19% | | searchNotFoundOneElemTreeTest | 284008.493 | 199549.589 | -30% | Here are the JMH test results: {noformat} No Fix [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 171.271 ? 3.872 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 5337.291 ? 306.129 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 65983.946 ? 10651.607 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 12743.251 ? 1171.717 ops/ms [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt 40 18608.504 ? 1671.129 ops/ms [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt 40 45796.932 ? 6543.235 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest thrpt 40 11140.872 ? 1355.109 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt 40 19670.935 ? 1775.676 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest thrpt 40 43176.354 ? 6862.149 ops/ms Fix [Binary Search] [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt 40 147.093 ? 7.968 ops/ms [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 167.288 ? 4.029 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 19788.253 ? 887.529 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 414832.010 ? 16663.043 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 11833.277 ? 1290.791 ops/ms [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt 40 36809.601 ? 2403.004 ops/ms [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt 40 288174.238 ? 11618.352 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest thrpt 40 10378.794 ? 1229.196 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt 40 28880.540 ? 1875.317 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest thrpt 40 284008.493 ? 7929.286 ops/ms Fix [Exponential search] [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt 40 164.849 ? 3.571 ops/ms [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 161.228 ? 6.219 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 19828.490 ? 727.999 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 397640.373 ? 18525.227 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 11816.892 ? 1305.738 ops/ms [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt 40 30074.161 ? 1916.827 ops/ms [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt 40 233786.942 ?
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828506#comment-15828506 ] Jason Brown commented on CASSANDRA-9988: [~jay.zhuang] it's all yours :) > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824837#comment-15824837 ] Jay Zhuang commented on CASSANDRA-9988: --- Can I take this one? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118944#comment-15118944 ] Piotr Jastrzebski commented on CASSANDRA-9988: -- That's very interesting. Thanks for the article :). > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118935#comment-15118935 ] Sam Tunnicliffe commented on CASSANDRA-9988: [~haaawk] see http://mechanical-sympathy.blogspot.co.uk/2012/04/invoke-interface-optimisations.html for example > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118947#comment-15118947 ] Benedict commented on CASSANDRA-9988: - FTR, the second half of my point 3 is very weak and only really likely to measurably affect icache occupancy, which is unlikely to be elicited in a microbenchmark, so feel free to ignore that. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117922#comment-15117922 ] Piotr Jastrzebski commented on CASSANDRA-9988: -- What do you mean by efficient method dispatch? Could you please explain me why having two implementations of an interface will result in faster method invocations (if that's what you have in mind) than when having three implementations? I know that implementing too many interfaces can have negative impact on performance due to collisions in method offsets but I never heard that number of implementations has any performance impact. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117219#comment-15117219 ] Benedict commented on CASSANDRA-9988: - Hi Piotr, Thanks for your patch. A few comments: # It would be great to have just one Leaf iterator, so that we can benefit from efficient method despatch, as there will be only two search iterator implementations # We could probably get uniformly better behaviour by implementing [exponential search|https://en.wikipedia.org/wiki/Exponential_search] for the leaf iterator; this would bring our total number of comparisons down from {{n lg n}} to {{n}} when searching an approximately equal set of items, without harming our best case complexity. # It would be preferable to get numbers using JMH, and comparing a codebase that only contains the full search iterator, versus one that contains both (and performs searches that cross the boundaries, so it's unknown which will be returned), as method despatch will potentially be harmed > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)