[
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506703
]
Doron Cohen commented on LUCENE-933:
------------------------------------
So an acceptable solution is:
Query parser will ignore empty clauses (e.g. ' ( ) ' ) resulted from words
filtering, the same as it already does for single words.
A straightforward fix is for QueryParser to avoid adding null (inner) queries
into (outer) clauses sets. (It makes sense, too.)
However this has a side effect:
For queries that became "empty" as result of filtering (stopping),
QueryParser would now return null.
This is an API semantics change, because applications that used to get a
BooleanQuery with 0 clauses as parse result, would now get a null query.
Here is a closer look on the behavior change:
Original behavior:
(1) parse(" ") == ParseException
(2) parse("( )") == ParseException
(3) parse("stop") == " "
(actually a boolean query with 0 clauses)
(4) parse("(stop)") == " "
(actually a boolean query with 0 clauses)
(5) parse("a stop b") == "a b"
(6) parse("a (stop) b") == "a () b"
(middle part is a boolean query with 0 clauses)
(7) parse("a ((stop)) b") == "a () b"
(again middle part is a boolean query with 0 clauses)
Modified behavior:
(3) parse("stop") == null
(4) parse("(stop)") == null
(6) parse("a (stop) b") == "a b"
(7) parse("a ((stop)) b") == "a b"
I think the modified behavior is the right one - applications can test a query
for being null and realize that it is a no-op.
However backwards compatibility is important - would this change break existing
applications with annoying new NPEs?
As an alternative, QueryParser parse() methods can be modified to return a
phony empty BQ instead of returning null, for the sake of backwards
compatibility.
Thoughts?
> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no
> tokens for input
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-933
> URL: https://issues.apache.org/jira/browse/LUCENE-933
> Project: Lucene - Java
> Issue Type: Bug
> Reporter: Hoss Man
> Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
> +foo:BBB +(yak:AAA baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC"
> portions of the query (posisbly because they are stop words) the resulting
> query produced by the QueryParser will be...
> +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty
> BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing
> encountering parens whose contents result in an empty BooleanQuery -- but
> what exactly it should do in the following situations...
> a) +foo:BBB +()
> b) +foo:BBB ()
> c) +foo:BBB -()
> ...is up for interpretation. I would think situation (b) clearly lends
> itself to dropping the sub-BooleanQuery completely. situation (c) may also
> lend itself to that solution, since semanticly it means "don't allow a match
> on any queries in the empty set of queries". .... I have no idea what the
> "right" thing to do for situation (a) is.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]