Re: QueryParser - proposed change may break existing queries.

Mark Harwood Thu, 17 Sep 2020 06:30:40 -0700

I think the decision comes down to choosing between silent
(mis)interpratations of ambiguous queries or noisy failures..


On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
>
>
> My idea would have been not to bee too strict and instead only detect it
> as a regex if its separated. So /foo/bar and /foo/iphone would both go
> through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would
> interpret the first token as regex.
>
>
>
> That’s just my idea, not sure if it makes sense to have this relaxed
> parsing. I was always very skeptical of adding the regexes, as it breaks
> many queries. Now it’s even more.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Mark Harwood <markharw...@gmail.com>
> *Sent:* Wednesday, September 16, 2020 6:45 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: QueryParser - proposed change may break existing queries.
>
>
>
> The strictness I was thinking of adding was to make all of the following
> error:
>
>  /foo/bar
>
>  /foo//bar/
>
>  /foo/iphone
>
>  /foo/AND x
>
>
>
> These would be allowed:
>
>  /foo/i bar
>
>  (/foo/ OR /bar/)
>
>  (/foo/ OR /bar/i)
>
>  /foo/^2
>
>  /foo/i^2
>
>
>
>
>
>
>
> On 16 Sep 2020, at 12:00, Uwe Schindler <u...@thetaphi.de> wrote:
>
> 
>
> In my opinion, the proposed syntax change should enforce to have
> whitespace or any other separator chat after the regex “i” parameter.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Mark Harwood <markharw...@gmail.com>
> *Sent:* Wednesday, September 16, 2020 11:04 AM
> *To:* dev@lucene.apache.org
> *Subject:* QueryParser - proposed change may break existing queries.
>
>
>
> In Lucene-9445 we'd like to add a case insensitive option to regex queries
> in the query parser of the form:
>
>    /Foo/i
>
>
>
> However, today people can search for :
>
>
>
>    /foo.com/index.html
>
>
>
> and not get an error. The searcher may think this is a query for a URL but
> it's actually parsed as a regex "foo.com" ORed with a term query.
>
>
>
> I'd like to draw attention to this proposed change in behaviour because I
> think it could affect many existing systems. Arguably it may be a positive
> in drawing attention to a number of existing silent failures (unescaped
> searches for urls or file paths) but equally could be seen as a negative
> breaking change by some.
>
>
>
> What is our BWC policy for changes to query parser?
>
> Do the benefits of the proposed new regex feature outweigh the costs of
> the breakages in your view?
>
>
>
>
> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>
>
>
>
>
>

Re: QueryParser - proposed change may break existing queries.

Reply via email to