Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Jim Ancona Thu, 22 Sep 2016 08:52:20 -0700

To answer DuyHai's question without introducing new syntax, I'd suggest:

LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'


So the first two %'s are translated to a literal, non-wildcard % and the
third % is a wildcard because it's not doubled.

Jim

On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> I guess that it should be similar to how it is done in SQL for LIKE
> patterns.
>
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
>
> What do you think?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> Hello Mikhail
>
> It's more complicated that it seems
>
> LIKE '%%escape' means  EQUAL TO '%escape'
>
> LIKE '%escape' means ENDS WITH 'escape'
>
> What's about LIKE '%%%escape' ????
>
> How should we treat this case ? Replace %% by % at the beginning of the
> searched term ??
>
>
>
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Hi!
>>
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>> help of LIKE.
>>
>> Item #1was resolved as CASSANDRA-12573.
>>
>> Regarding to item #2: you said the following:
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>> So is there any update on this?
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>> Hi!
>>
>> Have you had a chance to try your patch or solve the issue in an other
>> way?
>>
>> Thanks,
>> Mikhail
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is 
>> LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
>>>> Looks like we have different understanding of what results are expected.
>>>> I based my understanding on http://docs.datastax.com/en
>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>> that there is no semantical difference between two LIKE patterns (both of
>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>>> returns nothing (CASSANDRA-12573).
>>>>
>>>> What I’m missing?
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com> wrote:
>>>>
>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>
>>>> I don't see any problem in the results you got
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>> mikhail.krupits...@jetbrains.com> wrote:
>>>>
>>>>> Thanks for the reply.
>>>>> Could you please provide what index definition did you use?
>>>>> With the index from my script I get the following results:
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>> *  3 | escape%esc*
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   3
>>>>> * | escape%esc*(2 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>   3
>>>>> * | escape%esc*
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com> wrote:
>>>>>
>>>>> Use % to escape %
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>>
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>
>>>>> (1 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>
>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupits...@jetbrains.com> wrote:
>>>>>
>>>>>> Hi Cassandra guys,
>>>>>>
>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in
>>>>>> a search pattern.
>>>>>> Here is my test script:
>>>>>>
>>>>>> DROP keyspace if exists kmv;
>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>> USE kmv;
>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>> KEY(id, c1));
>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>> 'analyzed' : 'true',
>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>> 'case_sensitive' : 'false',
>>>>>> 'mode' : 'CONTAINS'
>>>>>> };
>>>>>>
>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>
>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>
>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>> How to find columns that end with ‘%’?
>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>>> CASSANDRA-12573).
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Reply via email to