[ 
https://issues.apache.org/jira/browse/LUCENE-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-9315:
---------------------------------------
    Attachment: LUCENE-9315.patch
        Status: Open  (was: Open)


I'm attaching a "strawman" patch that demonstrates an implementation of these 
changes.  Here's a table showing some non-trivial queries, and how they are 
parsed in both the "Current" QueryParserBase implemantion, as well as the 
"Proposed" implemenation with my patch...

||Input|| ||Current default=OR||Current default=AND|| ||Proposed 
default=OR||Proposed default=AND||
|X AND Y OR Z| |{color:#172b4d}+x +y z{color}|{color:#172b4d}+x y z{color}| |+x 
+y z|+x +y z|
|Z OR Y AND X| |z +y +x|z +y +x| |z y +x|z y +x|
|X Y OR Z| |x y z|+x y z| |x y z|+x y z|
|X Y AND Z| |x +y +z|+x +y +z| |x +y +z|+x +y +z|
|+Y OR Z| |+y z|y z| |+y z|+y z|
|+Y AND Z| |+y +z|+y +z| |+y +z|+y +z|
|Z OR +Y| |z +y|z y| |z +y|z +y|
|Z AND +Y| |+z +y|+z +y| |+z +y|+z +y|


Two things to note:
* This patch has a lot of nocommits around the fact that it "breaks" 
{{BooleanClause}} to remove the requirement that {{Occur}} must be non-null -- 
this is purely to keep the {{QueryParserBase.java}} porrtions of hte change 
simple and focusd on the _logical_ changes being made.
** If we decide to proceed with this idea, then the "correct" fix is to stop 
using {{BooleanClause}} directly in the various {{QueryParserBase}} protected 
methods, and instead introduce some new, equivilent, data structure for 
modeling "A Query Clause with _optional_ Occur metadata"
* this change breaks only one existing test: 
{{QueryParserTestBase.testPrecedence()}} -- which nominally tests that {{"A AND 
B OR C AND D"}} parses equivilently to {{"+A +B +C +D"}}, but this test only 
currently passes whhen {{QueryParser.getDefaultOperator()}} returns {{OR}} ... 
in the case of {{QueryParser.setDefaultOperator(AND)}} the {{" OR "}} in that 
input is effecitvely ignored.
** Wih the patch, the test is updated to consistently expect this (effectively 
"fully qualified") input string to be parsed as {{"+A +B C +D"}} (ie: left to 
right) regardless of what the default operator is.
** BUT: These test changes break 
{{oal.queryparser.flexible.standard.TestStandardQP.testPrecedence}} ... IIUC 
this is testing that the flexible {{StandardSyntaxParser}} is equivilent to 
Classic {{QueryParser}} -- I haven't dug into this much, but i think if we want 
to move forward with these semantic changes to {{QueryParser}} it just means 
some equivilent changes to the grammer and/or node builders for 
{{StandardSyntaxParser}} (or maybe just {{BooleanQuery2ModifierNodeProcessor}} 
?)


What do folks think of persuing this change, knowing it's a back compat break?


> redfine (Classi & Standard) QueryParser semantics to be consistent: 
> prioritize prefix op > infix op > default op
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9315
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9315
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: LUCENE-9315.patch
>
>
> For as long as I can remember, the way QueryParser deals with the "infix" 
> operators {{AND}} & {{OR}} hasn't made much sense unless they are used 
> consistently to express pure boolean logic (ie: always explicitly specified, 
> and never more then 2 clauses to a query). As soon as you have query strings 
> where a BooleanQuery has more then 2 clauses, or you have query strings that 
> mix {{AND}} & {{OR}} with the "prefix" {{+}} & {{-|NOT}} operators, or query 
> strings where not every clause has an operator, or (absolutely the most 
> confusing) you mix the types of operators _and_ change the QueryParser 
> "default op" from {{OR}} to {{AND}} the behavior just becomes inpossible to 
> make sense of for new users - and hard to explain/justify. (It's not 
> precedence based, it's not left to right, it's just ... weird.)
> The problem is so confusing to new users, that I wrote a blog post almost 10 
> years ago (?!?) trying to convince people that using {{AND}} & {{OR}} was a 
> terrible idea unless they were used only in strict boolean expressions)...
> [https://lucidworks.com/post/why-not-and-or-and-not/]
> ...and yet it still regularly comes up as a point of confusion.
> A lot this weird behavior seems to be historical artifact of how 
> {{QueryParserBase.addClauses()}} works - a method whose basic semantics 
> haven't really changed since Lucne 1.0.1, back before the introductiong of 
> {{QueryParser.setDefaultOperator()}}. Some of those early choices seemed to 
> be predicated on the idea that {{AND}} should take "precedence" (i use that 
> term loosely) over {{OR}} as it parses clauses left to right, purely becuase 
> {{OR}} was the "default" assumption (and had - and stll has - no 
> corrisponding "prefix" operator). As functionality in QueryParser has grown, 
> a lot of the assumptions made in the code and the resulting parse behavior 
> really make no sense to users, particularly in "non trivial" query strings. 
> In many cases, parse behavior that can seem "intentional" to new users, even 
> for input where every clause is impacted by an explicit {{AND}} or {{OR}} 
> operators, can suddenly be flipped on it's head when the "default operator" 
> is changed (ex: "{{X AND Y OR Z}}"), or if the only the order of "clauses" in 
> the string changes (ex: previous example vs "{{Z OR Y AND X}}") even though 
> it's clear from other queries that there is no strict precedence of operators.
> ----
> The "root" of the problem, as I see it, is that 
> {{QueryParserBase.addClauses()}} allows {{AND}} & {{OR}} to modify the 
> {{Occur}} property of the previously parsed {{BooleanClause}} depending on 
> _if_ that {{BooleanClause.getOccur()}} value matches the "default operator" 
> for the parser, w/o any considerationg to _why_ that that {{getOccur()}} 
> value matches the "default operator" - ie: did it actually come from the 
> "default" or was it explicitly set by something in the query string? (ie: a 
> prior infix operator)
> ----
> I propose that starting with Lucene 9.0, we redefine the semantics in 
> {{QueryParserBase}} such that:
>  * "Prefix" operators ({{+}} | {{-}} | {{NOT}}) always take precedence (over 
> any "Infix" operator or QueryParser default) in setting the {{Occur}} value 
> of the clause they prefix.
>  * "Infix" operators ({{AND}} | {{OR}}) are evaluated left to right and used 
> to set the {{Occur}} value of the clauses adjacent to them (that do not 
> already have a {{Occur}} value set by a "Pefix" operator)
>  * the {{QueryParser.getDefaultOperator()}} is only used to set the {{Occur}} 
> value of any clause that did not get an {{Occur}} value assigned by either a 
> prefix or (prior) infix operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to