Hi!

We’ve talked about two items:
1) ‘%’ as a wildcard in the middle of LIKE pattern.
2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of 
LIKE.

Item #1was resolved as CASSANDRA-12573.

Regarding to item #2: you said the following:
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch 

So is there any update on this?

Thanks,
Mikhail


> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com> wrote:
> 
> Hi!
> 
> Have you had a chance to try your patch or solve the issue in an other way? 
> 
> Thanks,
> Mikhail
>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> Ok so I've found the source of the issue, it's pretty well hidden because it 
>> is NOT in the SASI source code directly.
>> 
>> Here is the method where C* determines what kind of LIKE expression you're 
>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>> 
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>  
>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>> 
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
>> remove escape character BEFORE doing the matching so if your search is LIKE 
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>> 
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch  
>> 
>> 
>> 
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> Ok you're right, I get your point
>> 
>> LIKE '%%esc%' --> startWith('%esc')
>> 
>> LIKE 'escape%%' -->  = 'escape%'
>> 
>> What I strongly suspect is that in the source code of SASI, we parse the % 
>> xxx % expression BEFORE applying escape. That will explain the observed 
>> behavior. E.g:
>> 
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>> 
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>> 
>> Let me check in the source code and try to reproduce the issue
>> 
>> 
>> 
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on 
>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>> there is no semantical difference between two LIKE patterns (both of 
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
>> nothing (CASSANDRA-12573).
>> 
>> What I’m missing?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> CREATE CUSTOM INDEX ON test.escape(val) USING 
>>> 'org.apache.cassandra.index.sa 
>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = 
>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 
>>> 'case_sensitive': 'false'};
>>> 
>>> I don't see any problem in the results you got
>>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>>> 
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> Thanks for the reply.
>>> Could you please provide what index definition did you use?
>>> With the index from my script I get the following results:
>>> 
>>> cqlsh:test> select * from escape;
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>>   3 | escape%esc
>>> 
>>> Contains search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   3 | escape%esc
>>> (2 rows)
>>> 
>>> 
>>> Prefix search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>>   3 | escape%esc
>>> 
>>> Thanks,
>>> Mikhail 
>>> 
>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com 
>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>> 
>>>> Use % to escape %
>>>> 
>>>> cqlsh:test> select * from escape;
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>> 
>>>> 
>>>> Contains search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>> 
>>>> (1 rows)
>>>> 
>>>> 
>>>> Prefix search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>> 
>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
>>>> <mikhail.krupits...@jetbrains.com 
>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>> Hi Cassandra guys,
>>>> 
>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a 
>>>> search pattern.
>>>> Here is my test script:
>>>> 
>>>> DROP keyspace if exists kmv;
>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>> USE kmv;
>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
>>>> c1));
>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
>>>> 'org.apache.cassandra.index.sa 
>>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>> 'analyzed' : 'true',
>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa 
>>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>> 'case_sensitive' : 'false',
>>>> 'mode' : 'CONTAINS'
>>>> };
>>>> 
>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>> 
>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>> 
>>>> _pattern_ '%%%' finds all columns that contain %.
>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>> How to find columns that end with ‘%’?
>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ 
>>>> char form % as a command symbol? (Also there is a related issue 
>>>> CASSANDRA-12573).
>>>> 
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>> 
>>> 
>>> 
>> 
>> 
>> 
> 

Reply via email to