You could avoid (some of?) these problems by supporting /(?i)foo/ instead of /foo/i
-- Steve > On Sep 17, 2020, at 1:55 PM, Gus Heck <[email protected]> wrote: > > And as I understand it, current behavior is the silent misinterpretation. To > me, the failure to require a space after the regex (and either not become a > regex in that case or complain about invalid regex) might be considered a > bug... > > On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood <[email protected] > <mailto:[email protected]>> wrote: > I think the decision comes down to choosing between silent > (mis)interpratations of ambiguous queries or noisy failures.. > > On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > > > My idea would have been not to bee too strict and instead only detect it as a > regex if its separated. So /foo/bar and /foo/iphone would both go through and > ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the > first token as regex. > > > > That’s just my idea, not sure if it makes sense to have this relaxed parsing. > I was always very skeptical of adding the regexes, as it breaks many queries. > Now it’s even more. > > > > Uwe > > > > ----- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: [email protected] <mailto:[email protected]> > > > From: Mark Harwood <[email protected] <mailto:[email protected]>> > Sent: Wednesday, September 16, 2020 6:45 PM > To: [email protected] <mailto:[email protected]> > Subject: Re: QueryParser - proposed change may break existing queries. > > > > The strictness I was thinking of adding was to make all of the following > error: > > /foo/bar > > /foo//bar/ > > /foo/iphone > > /foo/AND x > > > > These would be allowed: > > /foo/i bar > > (/foo/ OR /bar/) > > (/foo/ OR /bar/i) > > /foo/^2 > > /foo/i^2 > > > > > > > > > On 16 Sep 2020, at 12:00, Uwe Schindler <[email protected] > <mailto:[email protected]>> wrote: > > > > In my opinion, the proposed syntax change should enforce to have whitespace > or any other separator chat after the regex “i” parameter. > > > > Uwe > > > > ----- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: [email protected] <mailto:[email protected]> > > > From: Mark Harwood <[email protected] <mailto:[email protected]>> > Sent: Wednesday, September 16, 2020 11:04 AM > To: [email protected] <mailto:[email protected]> > Subject: QueryParser - proposed change may break existing queries. > > > > In Lucene-9445 we'd like to add a case insensitive option to regex queries in > the query parser of the form: > > /Foo/i > > > > However, today people can search for : > > > > /foo.com/index.html <http://foo.com/index.html> > > > and not get an error. The searcher may think this is a query for a URL but > it's actually parsed as a regex "foo.com <http://foo.com/>" ORed with a term > query. > > > > I'd like to draw attention to this proposed change in behaviour because I > think it could affect many existing systems. Arguably it may be a positive in > drawing attention to a number of existing silent failures (unescaped searches > for urls or file paths) but equally could be seen as a negative breaking > change by some. > > > > What is our BWC policy for changes to query parser? > > Do the benefits of the proposed new regex feature outweigh the costs of the > breakages in your view? > > > > https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 > > <https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793> > > > > > > > -- > http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work) > http://www.the111shift.com <http://www.the111shift.com/> (play)
