RE: QueryParser - proposed change may break existing queries.

Uwe Schindler Thu, 17 Sep 2020 05:55:40 -0700

Hi,


My idea would have been not to bee too strict and instead only detect it as a 
regex if its separated. So /foo/bar and /foo/iphone would both go through and 
ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the 
first token as regex.

 

That’s just my idea, not sure if it makes sense to have this relaxed parsing. I 
was always very skeptical of adding the regexes, as it breaks many queries. Now 
it’s even more.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: [email protected]

 

From: Mark Harwood <[email protected]> 
Sent: Wednesday, September 16, 2020 6:45 PM
To: [email protected]
Subject: Re: QueryParser - proposed change may break existing queries.

 

The strictness I was thinking of adding was to make all of the following error:

 /foo/bar

 /foo//bar/

 /foo/iphone 

 /foo/AND x

 

These would be allowed:

 /foo/i bar

 (/foo/ OR /bar/)

 (/foo/ OR /bar/i)

 /foo/^2

 /foo/i^2

 

 





On 16 Sep 2020, at 12:00, Uwe Schindler <[email protected] 
<mailto:[email protected]> > wrote:



In my opinion, the proposed syntax change should enforce to have whitespace or 
any other separator chat after the regex “i” parameter.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: [email protected] <mailto:[email protected]> 

 

From: Mark Harwood <[email protected] <mailto:[email protected]> > 
Sent: Wednesday, September 16, 2020 11:04 AM
To: [email protected] <mailto:[email protected]> 
Subject: QueryParser - proposed change may break existing queries.

 

In Lucene-9445 we'd like to add a case insensitive option to regex queries in 
the query parser of the form: 

   /Foo/i

 

However, today people can search for :

 

   /foo.com/index.html <http://foo.com/index.html> 

 

and not get an error. The searcher may think this is a query for a URL but it's 
actually parsed as a regex "foo.com <http://foo.com> " ORed with a term query.

 

I'd like to draw attention to this proposed change in behaviour because I think 
it could affect many existing systems. Arguably it may be a positive in drawing 
attention to a number of existing silent failures (unescaped searches for urls 
or file paths) but equally could be seen as a negative breaking change by some.

 

What is our BWC policy for changes to query parser?

Do the benefits of the proposed new regex feature outweigh the costs of the 
breakages in your view?

 

https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793 
<https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793>
 
&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793

RE: QueryParser - proposed change may break existing queries.

Reply via email to