DOAN DuyHai created CASSANDRA-11130:
---------------------------------------

             Summary: [SASI Pre-QA] = semantics not respected when using 
StandardAnalyzer
                 Key: CASSANDRA-11130
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
             Project: Cassandra
          Issue Type: Bug
          Components: CQL
            Reporter: DOAN DuyHai


Tested from build 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]

{code:sql}
CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE music.albums (
    id int PRIMARY KEY,
    artist text,
    title1 text,
    title2 text
);

CREATE CUSTOM INDEX ON music.albums (title1) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};

CREATE CUSTOM INDEX ON music.albums (title2) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};

INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'Superpitcher', 
'Yesterday', 'Yesterday');
INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'Hilary Duff', 
'So Yesterday', 'So Yesterday');
INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'The Mr. T 
Experience', 'Yesterday Rules', 'Yesterday Rules');

SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';

 artist                 | title1
------------------------+----------------
           Superpitcher |       Yesterday
            Hilary Duff |    So Yesterday
   The Mr. T Experience | Yesterday Rules
 
(3 rows)

SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';

artist                 | title1
------------------------+----------------
           Superpitcher |       Yesterday
            Hilary Duff |    So Yesterday
   The Mr. T Experience | Yesterday Rules
  
(3 rows)
{code}

The semantic of *=* is not respected. SASI should return only 1 row with exact 
match. Using *LIKE* would return all 3 rows. It does impact both *PREFIX* and 
*CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with exact match.

 So indeed, the semantics of *=* depends on the chosen analyzer, which is 
inconsistent. We should force *=* to be exact match no matter which analyzer is 
chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to