[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900948#comment-15900948
 ] 

Alex Petrov commented on CASSANDRA-12915:
-----------------------------------------

I've looked at the code once again and turns out that we can't rely on disjoint 
for determining whether to return an empty iterator or no, since in case with 
union we would like to return just the iterators that produce results (empty 
ones won't produce any anyways) and in case with intersection, even though 
empty is overlapping with every set, we should make a distinction, since 
intersection with an empty iterator is empty. I have missed this yesterday and 
my tests were passing only by chance (since intersections were disjoint by 
themselves anyways).

I've addressed the issues 
[here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:12915-alternative].

A couple of comments on motivation:
  * a bit more tests to make sure we cover more cases
  * one of the problems revealed by new tests was that the original patch was 
yielding a bounce intersection iterator (which actually has min/max), but with 
empty range. Now we consistently return empty iterator that doesn't have min 
and max set. 
  * I wanted to avoid making a distinction for the first vs the rest ranges, 
mostly to use same code path
  * hopefully it became clearer when the empty iterator is going to be returned

Could you take another look at the patch and see if we have common ground here?
Thank you once again for clarifications and discussions: it's a complex 
problem, was hard to discover and isn't very simple to tackle from all sides 
simultaneously.

> SASI: Index intersection with an empty range really inefficient
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>             Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to