[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

Maxim Podkolzine (JIRA) Sun, 18 Sep 2016 10:03:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15501293#comment-15501293
 ]


Maxim Podkolzine edited comment on CASSANDRA-12573 at 9/18/16 5:03 PM:
-----------------------------------------------------------------------

> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
{code}
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
{code}

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
{code}
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id                                   | name
--------------------------------------+----------------------------
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id                                   | name
--------------------------------------+---------------------------
 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(2 rows)
cqlsh:excelsior> select id, name from demo where name like 
'%RevisionDiff%ItemImpl%';

 id                                   | name
--------------------------------------+--------------------------
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(1 rows)
{code}


was (Author: maximp):
> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
```
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
```

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
```
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id                                   | name
--------------------------------------+----------------------------
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id                                   | name
--------------------------------------+---------------------------
 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(2 rows)
cqlsh:excelsior> select id, name from demo where name like 
'%RevisionDiff%ItemImpl%';

 id                                   | name
--------------------------------------+--------------------------
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(1 rows)
```

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-12573
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Mikhail Krupitskiy
>            Priority: Critical
>              Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

Reply via email to