That's a much better idea, I like it. It's basically what Javas regex parser in the Pattern class also does.
If we do this we won't even need a syntax change. Uwe Am September 17, 2020 7:09:18 PM UTC schrieb Steve Rowe <[email protected]>: >You could avoid (some of?) these problems by supporting /(?i)foo/ >instead of /foo/i > >-- >Steve > >> On Sep 17, 2020, at 1:55 PM, Gus Heck <[email protected]> wrote: >> >> And as I understand it, current behavior is the silent >misinterpretation. To me, the failure to require a space after the >regex (and either not become a regex in that case or complain about >invalid regex) might be considered a bug... >> >> On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood <[email protected] ><mailto:[email protected]>> wrote: >> I think the decision comes down to choosing between silent >(mis)interpratations of ambiguous queries or noisy failures.. >> >> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <[email protected] ><mailto:[email protected]>> wrote: >> Hi, >> >> >> >> My idea would have been not to bee too strict and instead only detect >it as a regex if its separated. So /foo/bar and /foo/iphone would both >go through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ >would interpret the first token as regex. >> >> >> >> That’s just my idea, not sure if it makes sense to have this relaxed >parsing. I was always very skeptical of adding the regexes, as it >breaks many queries. Now it’s even more. >> >> >> >> Uwe >> >> >> >> ----- >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de <https://www.thetaphi.de/> >> eMail: [email protected] <mailto:[email protected]> >> >> >> From: Mark Harwood <[email protected] ><mailto:[email protected]>> >> Sent: Wednesday, September 16, 2020 6:45 PM >> To: [email protected] <mailto:[email protected]> >> Subject: Re: QueryParser - proposed change may break existing >queries. >> >> >> >> The strictness I was thinking of adding was to make all of the >following error: >> >> /foo/bar >> >> /foo//bar/ >> >> /foo/iphone >> >> /foo/AND x >> >> >> >> These would be allowed: >> >> /foo/i bar >> >> (/foo/ OR /bar/) >> >> (/foo/ OR /bar/i) >> >> /foo/^2 >> >> /foo/i^2 >> >> >> >> >> >> >> >> >> On 16 Sep 2020, at 12:00, Uwe Schindler <[email protected] ><mailto:[email protected]>> wrote: >> >> >> >> In my opinion, the proposed syntax change should enforce to have >whitespace or any other separator chat after the regex “i” parameter. >> >> >> >> Uwe >> >> >> >> ----- >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de <https://www.thetaphi.de/> >> eMail: [email protected] <mailto:[email protected]> >> >> >> From: Mark Harwood <[email protected] ><mailto:[email protected]>> >> Sent: Wednesday, September 16, 2020 11:04 AM >> To: [email protected] <mailto:[email protected]> >> Subject: QueryParser - proposed change may break existing queries. >> >> >> >> In Lucene-9445 we'd like to add a case insensitive option to regex >queries in the query parser of the form: >> >> /Foo/i >> >> >> >> However, today people can search for : >> >> >> >> /foo.com/index.html <http://foo.com/index.html> >> >> >> and not get an error. The searcher may think this is a query for a >URL but it's actually parsed as a regex "foo.com <http://foo.com/>" >ORed with a term query. >> >> >> >> I'd like to draw attention to this proposed change in behaviour >because I think it could affect many existing systems. Arguably it may >be a positive in drawing attention to a number of existing silent >failures (unescaped searches for urls or file paths) but equally could >be seen as a negative breaking change by some. >> >> >> >> What is our BWC policy for changes to query parser? >> >> Do the benefits of the proposed new regex feature outweigh the costs >of the breakages in your view? >> >> >> >> >https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 ><https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793> >> >> >> >> >> >> >> -- >> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> >(work) >> http://www.the111shift.com <http://www.the111shift.com/> (play) -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
