[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

Corentin Chary (JIRA) Fri, 17 Feb 2017 07:13:18 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871958#comment-15871958
 ]


Corentin Chary commented on CASSANDRA-12915:
--------------------------------------------

{code}
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE test.test (
    r text PRIMARY KEY,
    a text,
    b text,
    c text,
    data text
);

CREATE CUSTOM INDEX test_a_idx ON test.test (a) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
{code}

{code}
$ cat > generate.py
import sys
import random

def main(args):
    n = int(args[1])

    for i in xrange(n):
        a = '0'
        b = i % 10
        c = i % (n / 10) + random.randint(0, 10)
        print ("%d,%s,%d,%d,%d" % (i, a, b, c, i))

if __name__ == '__main__':
    main(sys.argv)
$ python generate.py 2000000 > test.csv
{code}
{code}
COPY test.test FROM 'test.csv'  WITH MAXBATCHSIZE = 100 AND MAXATTEMPTS = 10 
AND MAXINSERTERRORS = 999999;
{code}

{code}
cqlsh> SELECT * FROM test.test WHERE a = '1' AND c = '38151' LIMIT 1 ALLOW 
FILTERING;

 r | a | b | c | data
---+---+---+---+------

(0 rows)

Tracing session: fbc23200-f522-11e6-95df-69d39475f5a8

 activity                                                                       
                                                                       | 
timestamp                  | source    | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                
                                                    Execute CQL3 query | 
2017-02-17 16:08:48.288000 | 127.0.0.1 |              0 | 127.0.0.1
                                  Parsing SELECT * FROM test.test WHERE a = '1' 
AND c = '38151' LIMIT 1 ALLOW FILTERING; [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.288000 | 127.0.0.1 |            268 | 127.0.0.1
                                                                                
                     Preparing statement [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289000 | 127.0.0.1 |            513 | 127.0.0.1
 Index mean cardinalities are 
test_a_idx:-9223372036854775808,test_c_idx:-9223372036854775808. Scanning with 
test_a_idx. [Native-Transport-Requests-1] | 2017-02-17 16:08:48.289000 | 
127.0.0.1 |            913 | 127.0.0.1
                                                                                
               Computing ranges to query [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289000 | 127.0.0.1 |           1027 | 127.0.0.1
                Submitting range requests on 257 ranges with a concurrency of 1 
(-3.24259165E16 rows per range expected) [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289001 | 127.0.0.1 |           1319 | 127.0.0.1
                                                                                
   Submitted 1 concurrent range requests [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.290000 | 127.0.0.1 |           2229 | 127.0.0.1
                                                                                
      Executing read on test.test using index test_a_idx [ReadStage-3] | 
2017-02-17 16:08:48.292000 | 127.0.0.1 |           3494 | 127.0.0.1
                                                                                
                       Read 0 live and 0 tombstone cells [ReadStage-3] | 
2017-02-17 16:08:48.293000 | 127.0.0.1 |           4694 | 127.0.0.1
                                                                                
                                                      Request complete | 
2017-02-17 16:08:48.292930 | 127.0.0.1 |           4930 | 127.0.0.1
{code}


Yay ! No more iterating on the useless index.

Patch is on https://github.com/iksaif/cassandra/tree/sasi-null-intersect


> SASI: Index intersection with an empty range really inefficient
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>             Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

Reply via email to