[
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728412#comment-15728412
]
Alex Petrov commented on CASSANDRA-12915:
-----------------------------------------
Thank you a lot for hunting down the issue, investigating it and putting an
effort to compose a patch and benchmarks.
I'll address the issues you've mentioned a out of order, starting with the
worst case.
I think that this one can be fixed in a bit different way: we could just add an
iterator over the empty range in order to make sure that we do
not start iterating over all the tokens wasting cycles. I've tested it locally
and it works just fine. I would say, relying on the ratio in this
case is not very good, as the problem is not the incorrectly picked index (or
their order) but the fact it's empty. This is a very nice
catch though. I think we should add more tests for such edge cases.
bq. improvement with 2M rows.
I'd like to point out that this only means "improvement with 2M of rows
_matching the result for one of the indexes_". Simply having more rows
won't make big (or any) difference. Although I would argue that having an index
over the (super-)high-cardinality value is not always a very
good idea.
I've also done some benchmarking. I've used your dataset and the only
difference in setup I noted is that I had tracing off.
So far the differences in query performance between "2M index result" + "10
item index result" and "10 item index result' only are rather insignificant.
Re-running both queries
changes how much the code is warmed up and how much data is is already in a
buffer / cache. I can not find any differences that'd be anywhere close 4x.
Without optimisation:
{code}
Evaluation count : 120840 in 60 samples of 2014 calls.
Execution time mean : 529.665728 µs
Execution time std-deviation : 28.887664 µs
Execution time lower quantile : 497.592646 µs ( 2.5%)
Execution time upper quantile : 597.210481 µs (97.5%)
Overhead used : 9.220078 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 40.1467 % Variance is moderately inflated by outliers
{code}
With optimisation:
{code}
Evaluation count : 131820 in 60 samples of 2197 calls.
Execution time mean : 459.233418 µs
Execution time std-deviation : 13.307538 µs
Execution time lower quantile : 442.484287 µs ( 2.5%)
Execution time upper quantile : 492.580804 µs (97.5%)
Overhead used : 8.720718 ns
Found 5 outliers in 60 samples (8.3333 %)
low-severe 5 (8.3333 %)
Variance from outliers : 15.8052 % Variance is moderately inflated by outliers
{code}
I've tried measuring with different tools although results are pretty much the
same.
Now a little bit on the theoretical/code-based grounds for my current reasoning.
The way {{RangeIterator}} work and how it's optimised, we're picking the
["shortest" token
tree|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237]
and start skipping the second token tree
based on the results retrieved from the first one. So one of the indexes will
be iterated _anyways_. The second index (since it's larger) might
have to fetch/load roughly 10 blocks (since they might be located far from one
another on disk), but it _never_ has to fetch all 2M items. It'll
iterate _only_ as many items as the smallest index has. For example, two index
queries would skip through (left is which index is used, right
is index of the item within the token tree):
{code}
SI_test_c_idx.db idx 0
SI_test_a_idx.db idx 209
SI_test_c_idx.db idx 1
SI_test_a_idx.db idx 72
SI_test_c_idx.db idx 2
SI_test_a_idx.db idx 217
SI_test_c_idx.db idx 3
SI_test_a_idx.db idx 238
SI_test_c_idx.db idx 4
SI_test_a_idx.db idx 160
SI_test_c_idx.db idx 5
SI_test_a_idx.db idx 242
SI_test_c_idx.db idx 6
SI_test_a_idx.db idx 42
SI_test_c_idx.db idx 7
SI_test_a_idx.db idx 223
SI_test_c_idx.db idx 8
SI_test_a_idx.db idx 115
SI_test_c_idx.db idx 9
SI_test_a_idx.db idx 205
SI_test_c_idx.db idx 10
{code}
If we ran the query with an optimisation and used just one index:
{code}
SI_test_c_idx.db idx 0
SI_test_c_idx.db idx 1
SI_test_c_idx.db idx 2
SI_test_c_idx.db idx 3
SI_test_c_idx.db idx 4
SI_test_c_idx.db idx 5
SI_test_c_idx.db idx 6
SI_test_c_idx.db idx 7
SI_test_c_idx.db idx 8
SI_test_c_idx.db idx 9
SI_test_c_idx.db idx 10
{code}
And this is pretty much the only difference in the code path that we're taking
in both cases.
Moreover, RangeIterator already has different [optimisation
strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78]
based on differences in cardinality. I'd say that current benchmark
shows that the query is slightly slower (since we have to go through around
twice as much data on disk). But given the numbers at hand
the difference that is small and it's sub-millisecond, this optimisation seems
not to pay the complexity that it brings.
Do you think my reasoning is correct? I hope I did not miss anything.
As regards empty iterator optimisation, I think we should take it in. It's
bringing in a very significant optimisation for a very
realistic case.
> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
> Issue Type: Improvement
> Components: sasi
> Reporter: Corentin Chary
> Fix For: 3.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the
> intersection (and effectively only use 'index1') the query will run in a few
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection
> when we know it will be slow. Probably when the range size factor is very
> small and the range size is big.
> * CASSANDRA-10765
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)