[ 
https://issues.apache.org/jira/browse/LUCENE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107356#comment-17107356
 ] 

Mark Harwood commented on LUCENE-9370:
--------------------------------------

PR [here|https://github.com/apache/lucene-solr/pull/1516] which also addresses 
a backslash bug introduced in Lucene-9336.

> RegExpQuery should error for inappropriate use of \ character in input
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-9370
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9370
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: master (9.0)
>            Reporter: Mark Harwood
>            Priority: Minor
>
> The RegExp class is too lenient in parsing user input which can confuse or 
> mislead users and cause backwards compatibility issues as we enhance regex 
> support.
> In normal regular expression syntax the backslash is used to:
> *  escape a reserved character like  \. 
> *  use certain unreserved characters in a shorthand context e.g. \d means 
> digits [0-9]
>  
> The leniency bug in RegExp is that it adds an extra rule to this list - any 
> backslashed characters that don't satisfy the above rules are taken 
> literally. For example, there's no reason to put a backslash in front of the 
> letter "p" but we accept \p as the letter p.
> Java's Pattern class will throw a parse exception given a meaningless 
> backslash like \p.
> We should too.
> In [Lucene-9336|https://issues.apache.org/jira/browse/LUCENE-9336] we added 
> support for commonly supported regex expressions like `\d`. Sadly this is a 
> breaking change because of the leniency that has allowed \d to be accepted as 
> the letter d without an exception. Users were likely silently missing results 
> they were hoping for and we made a BWC problem for ourselves in filling in 
> the gaps.
> I propose we do like other RegEx parsers and error on inappropriate use of 
> backslashes.
> This will be another breaking change so should target 9.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to