[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848 ] Jason Brown edited comment on CASSANDRA-9988 at 11/9/17 12:15 AM: -- Overall, I think the patch is pretty solid. A couple of comments and nits: - is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now - just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. nits: - {{BTreeSet}} switch imports back to original format - in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. My biggest problem is with the microbench. I know this ticket has gone through a bunch on contortions over it's life, so maybe a little confusion is reasonable. However, what I was really hoping to see was a comparison of the leaf vs full (tree) iterators, as that's what this whole ticket is about. I took liberty of reworking the mircobench to explicitly test the differences bewtween leaf and tree iterators, as well as reduce of benchmark executions (due to every value between 0 and 31, inclusive, being a param to the {{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we only really care about testing reading from the beginning, middle, and end of the btree entries; anything else just bloats total test execution time). Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the {{targetIdx}} param, I think they can be moved to another class as they would be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a way to ignore that in JMH). My version of your branch with the updated microbench is [here|https://github.com/jasobrown/cassandra/tree/9988] EDIT: forgot to mention that with the updated microbench, I was able to measure that the leaf iterator was anywhere from 3-20% faster, depending on the test. So, indeed this is a good patch was (Author: jasobrown): Overall, I think the patch is pretty solid. A couple of comments and nits: - is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now - just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. nits: - {{BTreeSet}} switch imports back to original format - in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. My biggest problem is with the microbench. I know this ticket has gone through a bunch on contortions over it's life, so maybe a little confusion is reasonable. However, what I was really hoping to see was a comparison of the leaf vs full (tree) iterators, as that's what this whole ticket is about. I took liberty of reworking the mircobench to explicitly test the differences bewtween leaf and tree iterators, as well as reduce of benchmark executions (due to every value between 0 and 31, inclusive, being a param to the {{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we only really care about testing reading from the beginning, middle, and end of the btree entries; anything else just bloats total test execution time). Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the {{targetIdx}} param, I think they can be moved to another class as they would be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a way to ignore that in JMH). My version of your branch with the updated microbench is [here|https://github.com/jasobrown/cassandra/tree/9988] EDIT: forgot to mention that with the updated microbench, the leaf iterator was anywhere from 3-20% faster, depending on the test. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, > 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, > 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244848#comment-16244848 ] Jason Brown edited comment on CASSANDRA-9988 at 11/8/17 10:41 PM: -- Overall, I think the patch is pretty solid. A couple of comments and nits: - is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now - just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. nits: - {{BTreeSet}} switch imports back to original format - in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. My biggest problem is with the microbench. I know this ticket has gone through a bunch on contortions over it's life, so maybe a little confusion is reasonable. However, what I was really hoping to see was a comparison of the leaf vs full (tree) iterators, as that's what this whole ticket is about. I took liberty of reworking the mircobench to explicitly test the differences bewtween leaf and tree iterators, as well as reduce of benchmark executions (due to every value between 0 and 31, inclusive, being a param to the {{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we only really care about testing reading from the beginning, middle, and end of the btree entries; anything else just bloats total test execution time). Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the {{targetIdx}} param, I think they can be moved to another class as they would be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a way to ignore that in JMH). My version of your branch with the updated microbench is [here|https://github.com/jasobrown/cassandra/tree/9988] EDIT: forgot to mention that with the updated microbench, the leaf iterator was anywhere from 3-20% faster, depending on the test. was (Author: jasobrown): Overall, I think the patch is pretty solid. A couple of comments and nits: - is {{BTreeSearchIterator}} necessary anymore? It's just an empty interface now - just to confirm, {{FullBTreeSearchIterator}} is just a rename of the previous {{BTreeSearchIterator}} class? My diff'ing seems to confirm that, but I'd like the extra level of sanity checking. nits: - {{BTreeSet}} switch imports back to original format - in {{LeafBTreeSearchIterator}}, move {{#compareToFirst}} and {{#searchNext}} closer to the methods from where they are called. Or better yet, since each is only called from one place, just inline the functionality into those calling methods. My biggest problem is with the microbench. I know this ticket has gone through a bunch on contortions over it's life, so maybe a little confusion is reasonable. However, what I was really hoping to see was a comparison of the leaf vs full (tree) iterators, as that's what this whole ticket is about. I took liberty of reworking the mircobench to explicitly test the differences bewtween leaf and tree iterators, as well as reduce of benchmark executions (due to every value between 0 and 31, inclusive, being a param to the {{targetIdx}} field. (I simply removed about 2/3 of those values, as I think we only really care about testing reading from the beginning, middle, and end of the btree entries; anything else just bloats total test execution time). Further, as {{#iteratorTree()}} and {{#multiSearchFound}} don't use the {{targetIdx}} param, I think they can be moved to another class as they would be executed once for each of the entries in {{targetIdx}} (I'm not awatre of a way to ignore that in JMH). My version of your branch with the updated microbench is [here|https://github.com/jasobrown/cassandra/tree/9988] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, > 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, > 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118100#comment-16118100 ] Benedict edited comment on CASSANDRA-9988 at 8/8/17 12:27 PM: -- bq. The original implementation is valid. I think that's another example shows the code is not that easy to understand. Mea culpa, although I think that's more an indictment of my hubris and our mutual disinterest. If fifteen distracted minutes in a JIRA comment window procrastinating from real work yields a minor bug, it's probably not *that* complicated, and an IDE, test suite and a couple of hours would probably more than conquer it. bq. The graph is the search performance (throughput) for searching target on the different positions. Except that this isn't the same, because the search domain is different with different starting positions. This is also over-counting unrelated test setup costs. One would expect that, with a leaf size of 32, iterating the whole contents would be _approx._ 4x faster with exp. search than binary search (and 5x faster for a single search from the beginning); the fact it is only 3x is probably down to those extra costs. The worst inversion of performance would be around 2x (i.e. exp. search being 2x slower than b.search at its worst), with logarithmically spaced items - i.e. at position 16, 8, 4, 2, 1. However, in the slowest case for exp. search, we are doing much less work than the slowest for binary search. Of course, with a limit to 32 items the exp. search benefits are much less apparent. If we had a leaf size of 64 it would become much more stark. With a leaf size of 16, it would also become completely irrelevant. was (Author: benedict): bq. The original implementation is valid. I think that's another example shows the code is not that easy to understand. Mea culpa, although I think that's more an indictment of my hubris and our mutual disinterest. If fifteen distracted minutes in a JIRA comment window procrastinating from real work yields a minor bug, it's probably not *that* complicated, and an IDE, test suite and a couple of hours would probably more than conquer it. bq. The graph is the search performance (throughput) for searching target on the different positions. Except that this isn't the same, because the search domain is different with different starting positions. This is also over-counting unrelated test setup costs. One would expect that, with a leaf size of 32, iterating the whole contents would be _approx._ 4x faster with exp. search than binary search; the fact it is only 3x is probably down to those extra costs. The worst inversion of performance would be around 2x (i.e. exp. search being 2x slower than b.search at its worst), with logarithmically spaced items - i.e. at position 16, 8, 4, 2, 1. However, in the slowest case for exp. search, we are doing much less work than the slowest for binary search. Of course, with a limit to 32 items the exp. search benefits are much less apparent. If we had a leaf size of 64 it would become much more stark. With a leaf size of 16, it would also become completely irrelevant. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116055#comment-16116055 ] Jay Zhuang edited comment on CASSANDRA-9988 at 8/7/17 5:05 AM: --- Here are the test results to search target on the different index with the following search algorithms: 1. binary search 2. exponential search 3. binary search optimized for next adjacent value (Y axis is throughput, higher is better) !9988-3tests.png|tests! Here are my takes from the results: h4. 1. binary search: Pros: * best on average and evenly distributed; * Simple code, use JAVA native API. Cons: * not optimized to search for the adjacent value. h4. 2. exponential search: Pros: * optimized to search the first and second value for {{Dir.ASC}}. Cons: * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places; * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out. I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated. h4. 3. binary search optimized for next adjacent value Pros: * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}} Cons: * Average performance is a litter bit worse than binarySearch I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch: * consistency with other cassandra code, plus it's the current behave right now for btree search; * smaller mean deviations, could be good for tp99; * clean code. But I'm fine with any solutions: | 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | | 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] | | 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] | was (Author: jay.zhuang): Here are the test results to search target on the different index with the following search algorithms: 1. binary search 2. exponential search 3. binary search optimized for next adjacent value !9988-3tests.png|tests! Here are my takes from the results: h4. 1. binary search: Pros: * best on average and evenly distributed; * Simple code, use JAVA native API. Cons: * not optimized to search for the adjacent value. h4. 2. exponential search: Pros: * optimized to search the first and second value for {{Dir.ASC}}. Cons: * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places; * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out. I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated. h4. 3. binary search optimized for next adjacent value Pros: * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}} Cons: * Average performance is a litter bit worse than binarySearch I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch: * consistency with other cassandra code, plus it's the current behave right now for btree search; * smaller mean deviations, could be good for tp99; * clean code. But I'm fine with any solutions: | 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | | 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] | | 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] | > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt,
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073 ] Benedict edited comment on CASSANDRA-9988 at 7/29/17 8:22 AM: -- Assuming your implementation is correct, here is a modified version that works {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int bound = 0; int c = -1; while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key)) < 0) bound = Math.max(bound * 2, 1); if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include it in bsearch, so we do not need to increment lb + bound return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub + 1), key, comparator); {code} This implementation as a whole could be cleaned up a little, and there are a variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear. A "faster" variant would be: {code} while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound / 2)], key)) < 0) bound *= 2; {code} A faster-still variant would just offset lb by -1 at the start, so there are no recurring costs, and restore it after the loop. But better than this would be something like: {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int c = -1; int probe_ub = lb; int probe_incr = 1; while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) < 0) { lb = probe_ub; probe_ub += probe_incr; probe_incr *= 2; } if (c == 0) { return probe_ub; } return Arrays.binarySearch(keys, lb, Math.min(ub + 1, probe_ub), key, comparator); {code} This could still be cleaned up a little, in a decent editor. I haven't looked at the codebase in a long time, and don't have it checked out, so haven't consulted the branch iterator version, and don't actually recall precisely what algorithm we used. I don't recall it being a particularly elegant approach, though. In fact, it's possible it only compares the first element and then resorts to binary search; I do not recall for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it used to be called). I will note the exponential search as it stands here probably does not behave very well for reverse iteration. was (Author: benedict): Assuming your implementation is correct, here is a modified version that works {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int bound = 0; int c = -1; while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key)) < 0) bound = Math.max(bound * 2, 1); if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include it in bsearch, so we do not need to increment ub return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub), key, comparator); {code} This implementation as a whole could be cleaned up a little, and there are a variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear. A "faster" variant would be: {code} while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound / 2)], key)) < 0) bound *= 2; {code} A faster-still variant would just offset lb by -1 at the start, so there are no recurring costs, and restore it after the loop. But better than this would be something like: {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int c = -1; int probe_ub = lb; int probe_incr = 1; while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) < 0) { lb = probe_ub; probe_ub += probe_incr; probe_incr *= 2; } if (c == 0) { return probe_ub; } return Arrays.binarySearch(keys, lb, probe_ub, key, comparator); {code} I haven't looked at the codebase in a long time, and don't have it checked out, so haven't consulted the branch iterator version, and don't actually recall precisely what algorithm we used. I don't recall it being a particularly elegant approach, though. In fact, it's possible it only compares the first element and then resorts to binary search; I do not recall for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it used to be called). I will note the exponential search as it stands here probably does not
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106073#comment-16106073 ] Benedict edited comment on CASSANDRA-9988 at 7/29/17 8:18 AM: -- Assuming your implementation is correct, here is a modified version that works {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int bound = 0; int c = -1; while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key)) < 0) bound = Math.max(bound * 2, 1); if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include it in bsearch, so we do not need to increment ub return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub), key, comparator); {code} This implementation as a whole could be cleaned up a little, and there are a variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear. A "faster" variant would be: {code} while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound / 2)], key)) < 0) bound *= 2; {code} A faster-still variant would just offset lb by -1 at the start, so there are no recurring costs, and restore it after the loop. But better than this would be something like: {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int c = -1; int probe_ub = lb; int probe_incr = 1; while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) < 0) { lb = probe_ub; probe_ub += probe_incr; probe_incr *= 2; } if (c == 0) { return probe_ub; } return Arrays.binarySearch(keys, lb, probe_ub, key, comparator); {code} I haven't looked at the codebase in a long time, and don't have it checked out, so haven't consulted the branch iterator version, and don't actually recall precisely what algorithm we used. I don't recall it being a particularly elegant approach, though. In fact, it's possible it only compares the first element and then resorts to binary search; I do not recall for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it used to be called). I will note the exponential search as it stands here probably does not behave very well for reverse iteration. Also, apologies for against-style-guide variable names. But this comment box is too poor an editor to revise it any further. was (Author: benedict): Assuming your implementation is correct, here is a modified version that works {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int bound = 0; int c = -1; while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], key)) < 0) bound = Math.max(bound * 2, 1); if (c == 0) { return lb + bound; } // note, by checking == 0, we do not need to include it in bsearch, so we do not need to increment ub return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, ub), key, comparator); {code} This implementation as a whole could be cleaned up a little, and there are a variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but this one is at least very clear. A "faster" variant would be: {code} while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + (bound / 2)], key)) < 0) bound *= 2; {code} A faster-still variant would just offset lb by -1 at the start, so there are no recurring costs, and restore it after the loop. But better than this would be something like: {code} int lb = forwards ? nextPos : lowerBound; // inclusive int ub = forwards ? upperBound : nextPos; // inclusive int c = -1; int probe_ub = lb; int probe_incr = 1; while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) < 0) { lb = probe_ub; probe_ub += probe_incr; probe_incr *= 2; } if (c == 0) { return probe_ub; } return Arrays.binarySearch(keys, lb, probe_ub, key, comparator); {code} better still I haven't looked at the codebase in a long time, and don't have it checked out, so haven't consulted the branch iterator version, and don't actually recall precisely what algorithm we used. I don't recall it being a particularly elegant approach, though. In fact, it's possible it only compares the first element and then resorts to binary search; I do not recall for sure if we carried exponential search over from the ArrayBackedColumns (or whatever it used to be called). I will note the
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103033#comment-16103033 ] Benedict edited comment on CASSANDRA-9988 at 7/27/17 10:09 AM: --- Well, it's been a while since I looked at the internals, but iirc there is an extremely common scenario in which we enumerate most of the contents of such an iterator via search, and that is in the case of selecting columns from a row. Since we already have binary search's best case and exponential search's worst case in the benchmarks, and given we can expect the inverse scenario to be common, why not supplement with that so we can see the two extremes? Also, FTR: a properly implemented exponential search should only incur 1 comparison for each search in its best case, as the result of the most recent boundary comparison can be saved - if it is zero then no binary search is needed over the range. was (Author: benedict): Well, it's been a while since I looked at the internals, but iirc there is an extremely common scenario in which we enumerate most of the contents of such an iterator via search, and that is in the case of selecting columns from a row. Since we already have binary search's best case and exponential search's worst case in the benchmarks, and given we can expect the inverse scenario to be common, why not supplement with that so we can see the two extremes? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102704#comment-16102704 ] Jay Zhuang edited comment on CASSANDRA-9988 at 7/27/17 4:57 AM: Hi [~benedict], re-read [your comments 1.5 years' ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219], now I see your point: {quote} 2. We could probably get uniformly better behaviour by implementing exponential search for the leaf iterator; this would bring our total number of comparisons down from n lg n to n when searching *an approximately equal set of items*, without harming our best case complexity. {quote} In that case, {{i}} would be very small ({{i}} is where the target is), so if i = 1, expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how we should define the test: How many re-seeks should we do? How we choose next target? If we randomly select the next target, it's unfair to expSearch, if we always select next value (basically use search to iterator the tree) is unfair to binSearch. It really depends on the user scenario. Do you have any suggestion? was (Author: jay.zhuang): Hi [~benedict], re-read [your comments 1.5 years' ago|https://issues.apache.org/jira/browse/CASSANDRA-9988?focusedCommentId=15117219=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15117219], now I see your point: {quote} 2. We could probably get uniformly better behaviour by implementing exponential search for the leaf iterator; this would bring our total number of comparisons down from n lg n to n when searching *an approximately equal set of items*, without harming our best case complexity. {quote} In that case, {{i}} would be very small ({{i}} is where the target is), so expSearch ({{log( i ) + log( i )}}) just needs to do 2 comparisons. vs. binarySearch 5 comparisons: log(32) = 5 in worst case. Now the question is how we should define the test: How many re-seeks should we do? How we choose next target? If we randomly select the next target, it's unfair to expSearch, if we always select next value (basically use search to iterator the tree) is unfair to binSearch. It really depends on the user scenario. Do you have any suggestion? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102332#comment-16102332 ] Jay Zhuang edited comment on CASSANDRA-9988 at 7/26/17 9:46 PM: They're good feedbacks. I did the tests for expSearch: !9988-test-result3.png|test! I think the exponentialSearch didn't out perform the binaraySearch is because: # data size is too small. In this case, it only searches inside of one leaf, the maximum data size is 32 (default node size). In theory, the performance of exponential search is {{[O(log( i ) + log( i ))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is where the target is), vs. binary search is {{O(log( n ))}}. Because exponentialSearch has 2 stages, it could do more comparisons when the {{n}} is small. # Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations. So for this JIRA, I don't think it worth doing exponentialSearch. was (Author: jay.zhuang): They're good feedbacks. I did the tests for expSearch: !9988-test-result3.png|test! I think the exponentialSearch didn't out perform the binaraySearch is because: # data size is too small. In this case, it only searches inside of one leaf, the maximum data size is 32 (default node size). In theroy, the performance of exponential search is {{[O(log( i ) + log( i ))|https://en.wikipedia.org/wiki/Exponential_search#Performance]}} ({{i}} is where the target is), vs. binary search is {{O(log( n ))}}. Because exponentialSearch has 2 stages, it could do more compares when the {{n}} is small. # Maybe Java API {{Arrays.binarySearch()}} is already doing some optimizations. So for this JIRA, I don't think it worth doing exponentialSearch. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-test-result3.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099288#comment-16099288 ] Jay Zhuang edited comment on CASSANDRA-9988 at 7/25/17 12:23 AM: - hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K (they're generated from random UUID.tostring()). In the graph above, they're corresponding to 4 data points in each bTree size section. We don't see a significant difference between them. For example {{IteratorTree}}, the first 4 data points are the ones for one-node-BTree with different cellSize, they have the similar improvements (+800%), maybe here is better graph: !9988-result3.png|result! was (Author: jay.zhuang): hi [~benedict] I also added tests for different cellSize: 36, 100, 1k and 10K (they're generated from random UUID.tostring()). In the graph above, they're corresponding to 4 data points in each bTree size section. We don't see a significant difference between them. For example {{IteratorTree}}, the first 4 data points are the ones for one-node-BTree with different cellSize, they have the similar improvements (+800%), maybe here is better graph: !9988-result2.png|result! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, > 9988-result.png, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112 ] Jay Zhuang edited comment on CASSANDRA-9988 at 7/22/17 5:18 AM: I added tests for different cell size and 10K/100K b-tree size: | branch | utest | |[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]| |[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]|| If the data could fit into one leaf node (<=32), the performance is significantly improved. Otherwise, there's almost no performance difference, regardless of the cell size. !9988-data.png|data! !9988-result.png|result! was (Author: jay.zhuang): I added tests for different cell size and 10K/100K b-tree size: | branch | utest | |[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]| |[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]|| If the data could fit into one leaf node (<=32), the performance is significantly improved. Otherwise, there's no performance difference, regardless of the cell size. !9988-data.png|data! !9988-result.png|result! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097112#comment-16097112 ] Jay Zhuang edited comment on CASSANDRA-9988 at 7/22/17 5:14 AM: I added tests for different cell size and 10K/100K b-tree size: | branch | utest | |[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]| |[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]|| If the data could fit into one leaf node (<=32), the performance is significantly improved. Otherwise, there's no performance difference, regardless of the cell size. !9988-data.png|data! !9988-result.png|result! was (Author: jay.zhuang): I added tests for different cell size and 10K and 100K b-tree size: | branch | utest | |[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk]|[circleci#41|https://circleci.com/gh/cooldoger/cassandra/41]| |[9988-nofix|https://github.com/cooldoger/cassandra/tree/9988-nofix]|| If the data could fit into one leaf node (<=32), the performance is significantly improved. Otherwise, there's no performance difference, regardless of the cell size. !9988-data.png|data! !9988-result.png|result! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-data.png, 9988-result.png, 9988-trunk-new.txt, > 9988-trunk-new-update.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094089#comment-16094089 ] Anthony Grasso edited comment on CASSANDRA-9988 at 7/20/17 5:37 AM: [~jay.zhuang] two other things I noticed as far as the code goes * The patch needs to be updated so that it applies cleanly to _trunk_. * Some minor coding style updates are needed. I have left a comments on the [9988-trunk-onecommit-update2 | https://github.com/cooldoger/cassandra/commit/c5003812327d4475a7fb29f11686c13eeb50d693] branch. Let me know if you need a hand with either of these. I am happy to make that change as well. was (Author: anthony grasso): [~jay.zhuang] one other thing I noticed is that the patch needs to be updated so that it applies cleanly to _trunk_. Let me know if you need a hand with that. I am happy to make that change as well. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092832#comment-16092832 ] Anthony Grasso edited comment on CASSANDRA-9988 at 7/20/17 5:33 AM: Have started reviewing the code. Ran the partition unit tests on the branch {noformat} ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest {noformat} So far so good. was (Author: anthony grasso): Have reviewed the code. Ran the Microbench tests {noformat} ant test -Dtest.name=org.apache.cassandra.test.microbench.CompactionBench ant test -Dtest.name=org.apache.cassandra.test.microbench.BTreeSearchIteratorBench ant test -Dtest.name=org.apache.cassandra.test.microbench.DirectorySizerBench ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadExecutor ant test -Dtest.name=org.apache.cassandra.test.microbench.FastThreadLocalBench ant test -Dtest.name=org.apache.cassandra.test.microbench.MutationBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.OutputStreamBench ant test -Dtest.name=org.apache.cassandra.test.microbench.PendingRangesBench ant test -Dtest.name=org.apache.cassandra.test.microbench.ReadWriteTest ant test -Dtest.name=org.apache.cassandra.test.microbench.StreamingHistogramBench ant test -Dtest.name=org.apache.cassandra.test.microbench.PartitionImplementationTest {noformat} Ran the partition unit tests {noformat} ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionImplementationTest ant test -Dtest.name=org.apache.cassandra.db.partition.PartitionUpdateTest {noformat} Changes look good to me. > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966767#comment-15966767 ] Jay Zhuang edited comment on CASSANDRA-9988 at 4/13/17 12:21 AM: - Rebased the [patch|https://github.com/cooldoger/cassandra/commit/c5003812327d4475a7fb29f11686c13eeb50d693.patch]. Please review: [9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1] was (Author: jay.zhuang): Rebased the [patch|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1] on trunk. Please review: [9988-trunk-onecommit-update2|https://github.com/apache/cassandra/compare/trunk...cooldoger:9988-trunk-onecommit-update2?expand=1] > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838635#comment-15838635 ] Jay Zhuang edited comment on CASSANDRA-9988 at 1/25/17 9:42 PM: Thanks Benedict for your quick response. I updated the [patch|https://github.com/cooldoger/cassandra/commit/d2e4621542149c68b23cb36d83289d92d722f9a0] with your comment. [~jasobrown] would you please review the patch? was (Author: jay.zhuang): Thanks Benedict for your quick response. I updated the patch with your comment. [~jasobrown] would you please review the patch? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833718#comment-15833718 ] Jay Zhuang edited comment on CASSANDRA-9988 at 1/22/17 10:06 PM: - [Patch|https://github.com/cooldoger/cassandra/commit/8704eb8bbf94d5b556f67386d464a015ca98554b] is ready, please review. Here is the JMH test result before and after the patch, overall it significantly improved the iterator for small BTree: || [TestCode|https://github.com/cooldoger/cassandra/commit/5ae5397e1a4e2d8c5967cb847c2578df018e1def#diff-f687440fb97859f7be2813bf6182ba3eR89] || Before the patch || After the patch || Improvement || | iteratorBigTreeTest | 171.271 | 167.288 | -2% | | searchBigTreeTest | 12743.251 | 11833.277 | -7% | | searchNotFoundBigTreeTest | 11140.872 | 10378.794 | -7% | | iteratorLeafTreeTest | 5337.291 | 19788.253 | +271% | | searchFoundLeafTreeTest | 18608.504 | 36809.601 | +98% | | searchNotFoundLeafTreeTest | 19670.935 | 28880.540 | +47% | | iteratorOneElemTreeTest | 65983.946 | 414832.010 | +529% | | searchFoundOneElemTreeTest | 45796.932 | 288174.238 | +529% | | searchNotFoundOneElemTreeTest | 43176.354 | 284008.493 | +558% | Here is the code for JMH test without fix: https://github.com/cooldoger/cassandra/tree/9988-nofix I tried exponential search: https://github.com/cooldoger/cassandra/commit/a23a3422a894acb72a47975da035bbf8da50fc4b But the JMH test result is worse (other tests didn't change too much): || || Binary search || Exponential search || Improvement || | searchFoundLeafTreeTest | 36809.601 | 30074.161 | -18% | | searchNotFoundLeafTreeTest | 28880.540 | 24540.700 | -15% | | searchFoundOneElemTreeTest | 288174.238 | 233786.942 | -19% | | searchNotFoundOneElemTreeTest | 284008.493 | 199549.589 | -30% | Here are the JMH test results: {noformat} No Fix [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 171.271 ? 3.872 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 5337.291 ? 306.129 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 65983.946 ? 10651.607 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 12743.251 ? 1171.717 ops/ms [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt 40 18608.504 ? 1671.129 ops/ms [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt 40 45796.932 ? 6543.235 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest thrpt 40 11140.872 ? 1355.109 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt 40 19670.935 ? 1775.676 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest thrpt 40 43176.354 ? 6862.149 ops/ms Fix [Binary Search] [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt 40 147.093 ? 7.968 ops/ms [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 167.288 ? 4.029 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 19788.253 ? 887.529 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 414832.010 ? 16663.043 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 11833.277 ? 1290.791 ops/ms [java] BTreeSearchIteratorBench.searchFoundLeafTreeTestthrpt 40 36809.601 ? 2403.004 ops/ms [java] BTreeSearchIteratorBench.searchFoundOneElemTreeTest thrpt 40 288174.238 ? 11618.352 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundBigTreeTest thrpt 40 10378.794 ? 1229.196 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundLeafTreeTest thrpt 40 28880.540 ? 1875.317 ops/ms [java] BTreeSearchIteratorBench.searchNotFoundOneElemTreeTest thrpt 40 284008.493 ? 7929.286 ops/ms Fix [Exponential search] [java] BenchmarkMode Cnt Score Error Units [java] BTreeSearchIteratorBench.iteratorBigFullTreeTestthrpt 40 164.849 ? 3.571 ops/ms [java] BTreeSearchIteratorBench.iteratorBigTreeTestthrpt 40 161.228 ? 6.219 ops/ms [java] BTreeSearchIteratorBench.iteratorLeafTreeTest thrpt 40 19828.490 ? 727.999 ops/ms [java] BTreeSearchIteratorBench.iteratorOneElemTreeTestthrpt 40 397640.373 ? 18525.227 ops/ms [java] BTreeSearchIteratorBench.searchBigTreeTest thrpt 40 11816.892 ? 1305.738 ops/ms [java]
[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824837#comment-15824837 ] Jay Zhuang edited comment on CASSANDRA-9988 at 1/17/17 1:56 AM: Can I take this one? I'd like to work on this JIRA. was (Author: jay.zhuang): Can I take this one? > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Labels: patch > Fix For: 3.x > > Attachments: trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)