[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient

Alex Petrov (JIRA) Thu, 24 Nov 2016 06:15:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693384#comment-15693384
 ]


Alex Petrov commented on CASSANDRA-12915:
-----------------------------------------

I don't think we can just drop it, we should fix it. As far as I understand it, 
it should find an expression with least sstables for the given data range (but 
I might be mistaken), we should just investigate a bit deeper. 

bq. Skipping the following indexes if we already found one with less tokens 
than command.limits().count()

This one most likely won't work for CONTAINS queries, since we do not know how 
many items will get filtered out in the end. Having that said, all iterators 
are lazy, so just having them in correct order (from low to high cardinality, 
so that we fetched tokens for the low cardinality ones and skipped to tokens 
for higher cardinality indexes) and having filtering turned on where applicable 
should suffice. We can talk after the problem is solved if this is still a 
problem.

bq. Ordering expressions with a score

>From my basic understanding (I haven't written SASI, only worked on some 
>subset of it), that should help. However, that has to be very well tested 
>(tracing range iterators and understanding if order changes amount of seeks), 
>benchmarked and checked for correctness. 

bq.  Do you have a good suggestion to do that without doing the search ?

It's not available for now. But since there will be a format change in the next 
version, we could add it.

> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>             Fix For: 3.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection can be very inefficient

Reply via email to