Chris M. Hostetter created LUCENE-9315: ------------------------------------------
Summary: redfine (Classi & Standard) QueryParser semantics to be consistent: prioritize prefix op > infix op > default op Key: LUCENE-9315 URL: https://issues.apache.org/jira/browse/LUCENE-9315 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Chris M. Hostetter For as long as I can remember, the way QueryParser deals with the "infix" operators {{AND}} & {{OR}} hasn't made much sense unless they are used consistently to express pure boolean logic (ie: always explicitly specified, and never more then 2 clauses to a query). As soon as you have query strings where a BooleanQuery has more then 2 clauses, or you have query strings that mix {{AND}} & {{OR}} with the "prefix" {{+}} & {{-|NOT}} operators, or query strings where not every clause has an operator, or (absolutely the most confusing) you mix the types of operators _and_ change the QueryParser "default op" from {{OR}} to {{AND}} the behavior just becomes inpossible to make sense of for new users - and hard to explain/justify. (It's not precedence based, it's not left to right, it's just ... weird.) The problem is so confusing to new users, that I wrote a blog post almost 10 years ago (?!?) trying to convince people that using {{AND}} & {{OR}} was a terrible idea unless they were used only in strict boolean expressions)... [https://lucidworks.com/post/why-not-and-or-and-not/] ...and yet it still regularly comes up as a point of confusion. A lot this weird behavior seems to be historical artifact of how {{QueryParserBase.addClauses()}} works - a method whose basic semantics haven't really changed since Lucne 1.0.1, back before the introductiong of {{QueryParser.setDefaultOperator()}}. Some of those early choices seemed to be predicated on the idea that {{AND}} should take "precedence" (i use that term loosely) over {{OR}} as it parses clauses left to right, purely becuase {{OR}} was the "default" assumption (and had - and stll has - no corrisponding "prefix" operator). As functionality in QueryParser has grown, a lot of the assumptions made in the code and the resulting parse behavior really make no sense to users, particularly in "non trivial" query strings. In many cases, parse behavior that can seem "intentional" to new users, even for input where every clause is impacted by an explicit {{AND}} or {{OR}} operators, can suddenly be flipped on it's head when the "default operator" is changed (ex: "{{X AND Y OR Z}}"), or if the only the order of "clauses" in the string changes (ex: previous example vs "{{Z OR Y AND X}}") even though it's clear from other queries that there is no strict precedence of operators. ---- The "root" of the problem, as I see it, is that {{QueryParserBase.addClauses()}} allows {{AND}} & {{OR}} to modify the {{Occur}} property of the previously parsed {{BooleanClause}} depending on _if_ that {{BooleanClause.getOccur()}} value matches the "default operator" for the parser, w/o any considerationg to _why_ that that {{getOccur()}} value matches the "default operator" - ie: did it actually come from the "default" or was it explicitly set by something in the query string? (ie: a prior infix operator) ---- I propose that starting with Lucene 9.0, we redefine the semantics in {{QueryParserBase}} such that: * "Prefix" operators ({{+}} | {{-}} | {{NOT}}) always take precedence (over any "Infix" operator or QueryParser default) in setting the {{Occur}} value of the clause they prefix. * "Infix" operators ({{AND}} | {{OR}}) are evaluated left to right and used to set the {{Occur}} value of the clauses adjacent to them (that do not already have a {{Occur}} value set by a "Pefix" operator) * the {{QueryParser.getDefaultOperator()}} is only used to set the {{Occur}} value of any clause that did not get an {{Occur}} value assigned by either a prefix or (prior) infix operator. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org