DOAN DuyHai created CASSANDRA-12674:

             Summary: [SASI] Confusing AND/OR semantics for StandardAnalyzer 
                 Key: CASSANDRA-12674
             Project: Cassandra
          Issue Type: Bug
          Components: sasi
         Environment: Cassandra 3.7
            Reporter: DOAN DuyHai

Connected to Test Cluster at
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use test;
cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY 
KEY((id), clustering));
cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
    'mode': 'CONTAINS',
    'analyzed': 'true'};

//1st example SAME PARTITION KEY
cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 
cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 
cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';

 id | clustering | val
  1 |          1 | homeworker
  1 |          2 | hardworker

(2 rows)

cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 'speedrun');
cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 'longrun');
cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';

 id | clustering | val
 11 |          1 | longrun

(1 rows)

In the 1st example, both rows belong to the same partition so SASI returns both 
values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 'home'}} so 
the result makes sense

In the 2nd example, only one row is returned whereas we expect 2 rows because 
{{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should 
be returned too.

So where is the problem ? Explanation:

When there is only 1 predicate, the root operation type is an *AND*:

    private Operation analyze()
            Operation.Builder and = new Operation.Builder(OperationType.AND, 
            return and.complete();

During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for the 
searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. 
However, this piece of code just ruins the *OR* logic:

        public Operation complete()
            if (!expressions.isEmpty())
                ListMultimap<ColumnDefinition, Expression> analyzedExpressions 
= analyzeGroup(controller, op, expressions);
                RangeIterator.Builder<Long, Token> range = 
controller.getIndexes(op, analyzedExpressions.values());

As you can see, we blindly take all the *values* of the MultiMap (which 
contains a single entry for the {{val}} column with 2 expressions) and pass it 
to {{controller.getIndexes(...)}}

    public RangeIterator.Builder<Long, Token> getIndexes(OperationType op, 
Collection<Expression> expressions)
        if (resources.containsKey(expressions))
            throw new IllegalArgumentException("Can't process the same 
expressions multiple times.");

        RangeIterator.Builder<Long, Token> builder = op == OperationType.OR
                                                ? RangeUnionIterator.<Long, 
RangeIntersectionIterator.<Long, Token>builder();

And because the root operation has *AND* type, the 
{{RangeIntersectionIterator}} will be used on both expressions {{long}} and 

So when data belong to different partitions, we have the *AND* logic that 
applies and eliminates _speedrun_

When data belong to the same partition but different row, the 
{{RangeIntersectionIterator}} returns a single partition and then the rows are 
filtered further by {{operationTree.satisfiedBy}} and the results are correct

            while (currentKeys.hasNext())
                    DecoratedKey key =;

                    if (!keyRange.right.isMinimum() && 
keyRange.right.compareTo(key) < 0)
                        return endOfData();

                    try (UnfilteredRowIterator partition = 
controller.getPartition(key, executionController))
                        Row staticRow = partition.staticRow();
                        List<Unfiltered> clusters = new ArrayList<>();

                        while (partition.hasNext())
                            Unfiltered row =;
                            if (operationTree.satisfiedBy(row, staticRow, true))

/cc [~xedin] [~ifesdjeen]

This message was sent by Atlassian JIRA

Reply via email to