[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient

Alex Petrov (JIRA) Wed, 07 Dec 2016 02:39:02 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728412#comment-15728412
 ]


Alex Petrov commented on CASSANDRA-12915:
-----------------------------------------

Thank you a lot for hunting down the issue, investigating it and putting an 
effort to compose a patch and benchmarks. 

I'll address the issues you've mentioned a out of order, starting with the 
worst case.

I think that this one can be fixed in a bit different way: we could just add an 
iterator over the empty range in order to make sure that we do 
not start iterating over all the tokens wasting cycles. I've tested it locally 
and it works just fine. I would say, relying on the ratio in this 
case is not very good, as the problem is not the incorrectly picked index (or 
their order) but the fact it's empty. This is a very nice
catch though. I think we should add more tests for such edge cases.

bq. improvement with 2M rows.

I'd like to point out that this only means "improvement with 2M of rows 
_matching the result for one of the indexes_". Simply having more rows 
won't make big (or any) difference. Although I would argue that having an index 
over the (super-)high-cardinality value is not always a very 
good idea. 

I've also done some benchmarking. I've used your dataset and the only 
difference in setup I noted is that I had tracing off. 

So far the differences in query performance between "2M index result" + "10 
item index result" and "10 item index result' only are rather insignificant. 
Re-running both queries 
changes how much the code is warmed up and how much data is is already in a 
buffer / cache. I can not find any differences that'd be anywhere close 4x. 

Without optimisation: 

{code}
                Evaluation count : 120840 in 60 samples of 2014 calls.
             Execution time mean : 529.665728 µs
    Execution time std-deviation : 28.887664 µs
   Execution time lower quantile : 497.592646 µs ( 2.5%)
   Execution time upper quantile : 597.210481 µs (97.5%)
                   Overhead used : 9.220078 ns

Found 2 outliers in 60 samples (3.3333 %)
        low-severe       2 (3.3333 %)
 Variance from outliers : 40.1467 % Variance is moderately inflated by outliers
{code}

With optimisation:

{code}
                Evaluation count : 131820 in 60 samples of 2197 calls.
             Execution time mean : 459.233418 µs
    Execution time std-deviation : 13.307538 µs
   Execution time lower quantile : 442.484287 µs ( 2.5%)
   Execution time upper quantile : 492.580804 µs (97.5%)
                   Overhead used : 8.720718 ns

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       5 (8.3333 %)
 Variance from outliers : 15.8052 % Variance is moderately inflated by outliers
{code}

I've tried measuring with different tools although results are pretty much the 
same. 

Now a little bit on the theoretical/code-based grounds for my current reasoning.

The way {{RangeIterator}} work and how it's optimised, we're picking the 
["shortest" token 
tree|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237]
 and start skipping the second token tree
based on the results retrieved from  the first one. So one of the indexes will 
be iterated _anyways_. The second index (since it's larger) might 
have to fetch/load roughly  10 blocks (since they might be located far from one 
another on disk), but it _never_ has to fetch all 2M items. It'll 
iterate _only_  as many items as the smallest index has. For example, two index 
queries would skip through (left is which index is used, right 
is index of the item within the token tree): 

{code}
SI_test_c_idx.db idx 0
SI_test_a_idx.db idx 209
SI_test_c_idx.db idx 1
SI_test_a_idx.db idx 72
SI_test_c_idx.db idx 2
SI_test_a_idx.db idx 217
SI_test_c_idx.db idx 3
SI_test_a_idx.db idx 238
SI_test_c_idx.db idx 4
SI_test_a_idx.db idx 160
SI_test_c_idx.db idx 5
SI_test_a_idx.db idx 242
SI_test_c_idx.db idx 6
SI_test_a_idx.db idx 42
SI_test_c_idx.db idx 7
SI_test_a_idx.db idx 223
SI_test_c_idx.db idx 8
SI_test_a_idx.db idx 115
SI_test_c_idx.db idx 9
SI_test_a_idx.db idx 205
SI_test_c_idx.db idx 10
{code}

If we ran the query with an optimisation and used just one index: 

{code}
SI_test_c_idx.db idx 0
SI_test_c_idx.db idx 1
SI_test_c_idx.db idx 2
SI_test_c_idx.db idx 3
SI_test_c_idx.db idx 4
SI_test_c_idx.db idx 5
SI_test_c_idx.db idx 6
SI_test_c_idx.db idx 7
SI_test_c_idx.db idx 8
SI_test_c_idx.db idx 9
SI_test_c_idx.db idx 10
{code}

And this is pretty much the only difference in the code path that we're taking 
in both cases.

Moreover, RangeIterator already has different [optimisation 
strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78]
 based on differences in cardinality. I'd say that current benchmark
shows that the query is slightly slower (since we have to go through around 
twice as much data on disk). But given the numbers at hand 
the difference that is small and it's sub-millisecond, this optimisation seems 
not to pay the complexity that it brings. 

Do you think my reasoning is correct? I hope I did not miss anything. 

As regards empty iterator optimisation, I think we should take it in. It's 
bringing in a very significant optimisation for a very 
realistic case.

> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>             Fix For: 3.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient

Reply via email to