[
https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003129#comment-15003129
]
Hoss Man commented on LUCENE-6889:
----------------------------------
maybe i'm reading the patch wrong, but it looks like the "actuallyRewritten"
check from the recursive rewriting will return before several of the
optimizations. Shouldn't things like "remove duplicate FILTER and MUST_NOT
clauses" and "remove FILTER clauses that are also MUST clauses" still be
tested/done against the rewritten sub-clauses?
likewise isn't doing the "remove FILTER clauses that are also MUST clauses"
optimization still worthwhile even if the "remove duplicate FILTER and MUST_NOT
clauses" optimization finds & removes things? (it also looks like it has a
short-circuit return)
bq. ... as well as a random test that makes sure that the same set of matches
and scores are produced if no rewriting is performed.
why not randomize the docs/fields in the index as well? At first glance one
concern i have is that no doc has a single term more then once, so spotting
subtle score discrepancies between the query and it's optimize version may
never come into play with this test.
other small concerns about the current random query generation:
* is {{rarely()}} really appropriate for the BoostQuery wrapping? is that
something that really makes sense to increase depending on wether the test is
nightly? ... seems like something more straight forward like
{{0==TestUtil.nextInt(random(), 0, 10)}} would make more sense for these tests
* randomizing setMinimumNumberShouldMatch between 0 and numClauses means that
it's going to be very rare for the minimumNumberShouldMatch setting to actually
impact the query unless there are also a lot of random SHOULD clauses (ie: if
there are 5 clauses but only 2 SHOULD clauses there's only a 2:5 chance of that
setting getting a random value that actually affects anything) ... probably
better to count the actual # of SHOULD clauses generated randomly and then
randomize the setting between 0 and #+1.
* MatchNoDocsQuery should probably be included in the randomization to better
coverage of all the optimization situations.
> BooleanQuery.rewrite could easily optimize some simple cases
> ------------------------------------------------------------
>
> Key: LUCENE-6889
> URL: https://issues.apache.org/jira/browse/LUCENE-6889
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6889.patch
>
>
> Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write
> BooleanQuery instances that are not optimal, for instance a typical case that
> happens often with Solr/Elasticsearch is to send a request that has a
> MatchAllDocsQuery as a query and some filter, which could be executed more
> efficiently by directly wrapping the filter into a ConstantScoreQuery.
> Here are some ideas of rewrite operations that BooleanQuery could perform:
> - remove FILTER clauses when they are also a MUST clause
> - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter)
> - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER
> clause is also a MUST_NOT clause
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]