[ https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003129#comment-15003129 ]
Hoss Man commented on LUCENE-6889: ---------------------------------- maybe i'm reading the patch wrong, but it looks like the "actuallyRewritten" check from the recursive rewriting will return before several of the optimizations. Shouldn't things like "remove duplicate FILTER and MUST_NOT clauses" and "remove FILTER clauses that are also MUST clauses" still be tested/done against the rewritten sub-clauses? likewise isn't doing the "remove FILTER clauses that are also MUST clauses" optimization still worthwhile even if the "remove duplicate FILTER and MUST_NOT clauses" optimization finds & removes things? (it also looks like it has a short-circuit return) bq. ... as well as a random test that makes sure that the same set of matches and scores are produced if no rewriting is performed. why not randomize the docs/fields in the index as well? At first glance one concern i have is that no doc has a single term more then once, so spotting subtle score discrepancies between the query and it's optimize version may never come into play with this test. other small concerns about the current random query generation: * is {{rarely()}} really appropriate for the BoostQuery wrapping? is that something that really makes sense to increase depending on wether the test is nightly? ... seems like something more straight forward like {{0==TestUtil.nextInt(random(), 0, 10)}} would make more sense for these tests * randomizing setMinimumNumberShouldMatch between 0 and numClauses means that it's going to be very rare for the minimumNumberShouldMatch setting to actually impact the query unless there are also a lot of random SHOULD clauses (ie: if there are 5 clauses but only 2 SHOULD clauses there's only a 2:5 chance of that setting getting a random value that actually affects anything) ... probably better to count the actual # of SHOULD clauses generated randomly and then randomize the setting between 0 and #+1. * MatchNoDocsQuery should probably be included in the randomization to better coverage of all the optimization situations. > BooleanQuery.rewrite could easily optimize some simple cases > ------------------------------------------------------------ > > Key: LUCENE-6889 > URL: https://issues.apache.org/jira/browse/LUCENE-6889 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-6889.patch > > > Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write > BooleanQuery instances that are not optimal, for instance a typical case that > happens often with Solr/Elasticsearch is to send a request that has a > MatchAllDocsQuery as a query and some filter, which could be executed more > efficiently by directly wrapping the filter into a ConstantScoreQuery. > Here are some ideas of rewrite operations that BooleanQuery could perform: > - remove FILTER clauses when they are also a MUST clause > - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter) > - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER > clause is also a MUST_NOT clause -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org