Hi! We’ve talked about two items: 1) ‘%’ as a wildcard in the middle of LIKE pattern. 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.
Item #1was resolved as CASSANDRA-12573. Regarding to item #2: you said the following: > A possible fix would be: > > 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the > column data type) > 2) remove the escape character e.g. before parsing OR use some advanced regex > to exclude the %% from parsing e.g > > Step 2) is dead easy but step 1) is harder because I don't know if converting > the bytebuffer into String at this stage of the CQL parser is expensive or > not (in term of computation) > > Let me try a patch So is there any update on this? Thanks, Mikhail > On 20 Sep 2016, at 18:38, Mikhail Krupitskiy > <mikhail.krupits...@jetbrains.com> wrote: > > Hi! > > Have you had a chance to try your patch or solve the issue in an other way? > > Thanks, > Mikhail >> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com >> <mailto:doanduy...@gmail.com>> wrote: >> >> Ok so I've found the source of the issue, it's pretty well hidden because it >> is NOT in the SASI source code directly. >> >> Here is the method where C* determines what kind of LIKE expression you're >> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) >> >> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 >> >> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778> >> >> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to >> remove escape character BEFORE doing the matching so if your search is LIKE >> '%%esc%', the detected expression is LIKE_CONTAINS. >> >> A possible fix would be: >> >> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the >> column data type) >> 2) remove the escape character e.g. before parsing OR use some advanced >> regex to exclude the %% from parsing e.g >> >> Step 2) is dead easy but step 1) is harder because I don't know if >> converting the bytebuffer into String at this stage of the CQL parser is >> expensive or not (in term of computation) >> >> Let me try a patch >> >> >> >> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com >> <mailto:doanduy...@gmail.com>> wrote: >> Ok you're right, I get your point >> >> LIKE '%%esc%' --> startWith('%esc') >> >> LIKE 'escape%%' --> = 'escape%' >> >> What I strongly suspect is that in the source code of SASI, we parse the % >> xxx % expression BEFORE applying escape. That will explain the observed >> behavior. E.g: >> >> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >> >> LIKE 'escape%%' parsed as xxx% where xxx =escape% >> >> Let me check in the source code and try to reproduce the issue >> >> >> >> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy >> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> >> wrote: >> Looks like we have different understanding of what results are expected. >> I based my understanding on >> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html >> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html> >> According to the doc ‘esc’ is a pattern for exact match and I guess that >> there is no semantical difference between two LIKE patterns (both of >> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >> >>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>> containing '%esc' so %escapeme is a possible match and also escape%esc >> Why ‘containing’? I expect that it should be ’starting’.. >>> >>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>> starting with 'escape%' so escape%me is a valid result and also escape%esc >> Why ’starting’? I expect that it should be ‘exact matching’. >> >> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns >> nothing (CASSANDRA-12573). >> >> What I’m missing? >> >> Thanks, >> Mikhail >> >>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com >>> <mailto:doanduy...@gmail.com>> wrote: >>> >>> CREATE CUSTOM INDEX ON test.escape(val) USING >>> 'org.apache.cassandra.index.sa >>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = >>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>> 'case_sensitive': 'false'}; >>> >>> I don't see any problem in the results you got >>> >>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>> containing '%esc' so %escapeme is a possible match and also escape%esc >> Why ‘containing’? I expect that it should be ’starting’.. >>> >>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>> starting with 'escape%' so escape%me is a valid result and also escape%esc >> Why ’starting’? I expect that it should be ‘exact matching’. >> >>> >>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy >>> <mikhail.krupits...@jetbrains.com >>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>> Thanks for the reply. >>> Could you please provide what index definition did you use? >>> With the index from my script I get the following results: >>> >>> cqlsh:test> select * from escape; >>> >>> id | val >>> ----+----------- >>> 1 | %escapeme >>> 2 | escape%me >>> 3 | escape%esc >>> >>> Contains search >>> >>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>> >>> id | val >>> ----+----------- >>> 1 | %escapeme >>> 3 | escape%esc >>> (2 rows) >>> >>> >>> Prefix search >>> >>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>> >>> id | val >>> ----+----------- >>> 2 | escape%me >>> 3 | escape%esc >>> >>> Thanks, >>> Mikhail >>> >>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com >>>> <mailto:doanduy...@gmail.com>> wrote: >>>> >>>> Use % to escape % >>>> >>>> cqlsh:test> select * from escape; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> 2 | escape%me >>>> >>>> >>>> Contains search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>> >>>> id | val >>>> ----+----------- >>>> 1 | %escapeme >>>> >>>> (1 rows) >>>> >>>> >>>> Prefix search >>>> >>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>> >>>> id | val >>>> ----+----------- >>>> 2 | escape%me >>>> >>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy >>>> <mikhail.krupits...@jetbrains.com >>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote: >>>> Hi Cassandra guys, >>>> >>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a >>>> search pattern. >>>> Here is my test script: >>>> >>>> DROP keyspace if exists kmv; >>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>> USE kmv; >>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, >>>> c1)); >>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING >>>> 'org.apache.cassandra.index.sa >>>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = { >>>> 'analyzed' : 'true', >>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', >>>> 'case_sensitive' : 'false', >>>> 'mode' : 'CONTAINS' >>>> }; >>>> >>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>> >>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>> >>>> _pattern_ '%%%' finds all columns that contain %. >>>> How to find columns that start form ‘%’ or ‘%a’? >>>> How to find columns that end with ‘%’? >>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ >>>> char form % as a command symbol? (Also there is a related issue >>>> CASSANDRA-12573). >>>> >>>> >>>> Thanks, >>>> Mikhail >>>> >>> >>> >> >> >> >