[ 
https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003129#comment-15003129
 ] 

Hoss Man commented on LUCENE-6889:
----------------------------------

maybe i'm reading the patch wrong, but it looks like the "actuallyRewritten" 
check from the recursive rewriting will return before several of the 
optimizations.  Shouldn't things like "remove duplicate FILTER and MUST_NOT 
clauses" and "remove FILTER clauses that are also MUST clauses" still be 
tested/done against the rewritten sub-clauses?

likewise isn't doing the "remove FILTER clauses that are also MUST clauses" 
optimization still worthwhile even if the "remove duplicate FILTER and MUST_NOT 
clauses" optimization finds & removes things? (it also looks like it has a 
short-circuit return)

bq. ... as well as a random test that makes sure that the same set of matches 
and scores are produced if no rewriting is performed.

why not randomize the docs/fields in the index as well?  At first glance one 
concern i have is that no doc has a single term more then once, so spotting 
subtle score discrepancies between the query and it's optimize version may 
never come into play with this test.

other small concerns about the current random query generation:
* is {{rarely()}} really appropriate for the BoostQuery wrapping? is that 
something that really makes sense to increase depending on wether the test is 
nightly? ... seems like something more straight forward like 
{{0==TestUtil.nextInt(random(), 0, 10)}} would make more sense for these tests
* randomizing setMinimumNumberShouldMatch between 0 and numClauses means that 
it's going to be very rare for the minimumNumberShouldMatch setting to actually 
impact the query unless there are also a lot of random SHOULD clauses (ie: if 
there are 5 clauses but only 2 SHOULD clauses there's only a 2:5 chance of that 
setting getting a random value that actually affects anything) ... probably 
better to count the actual # of SHOULD clauses generated randomly and then 
randomize the setting between 0 and #+1.
* MatchNoDocsQuery should probably be included in the randomization to better 
coverage of all the optimization situations.

> BooleanQuery.rewrite could easily optimize some simple cases
> ------------------------------------------------------------
>
>                 Key: LUCENE-6889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6889
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6889.patch
>
>
> Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write 
> BooleanQuery instances that are not optimal, for instance a typical case that 
> happens often with Solr/Elasticsearch is to send a request that has a 
> MatchAllDocsQuery as a query and some filter, which could be executed more 
> efficiently by directly wrapping the filter into a ConstantScoreQuery.
> Here are some ideas of rewrite operations that BooleanQuery could perform:
>  - remove FILTER clauses when they are also a MUST clause
>  - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter)
>  - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER 
> clause is also a MUST_NOT clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to