I'll have to look further with the team on what these queries look like in terms of relative cheapness and throughput.
RE MinShouldMatchSumScorer.cost() (static method called by Boolean2ScorerSupplier.computeCost) -- it's pretty easy to see that this is called with minShouldMatch==0 (or 1). Set a conditional breakpoint in your IDE and run TestBooleanMinShouldMatch.testRandomQueries On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand <jpou...@gmail.com> wrote: > I suspect this could only show up as a bottleneck if they run very cheap > queries (low cost) at a very high throughput? Is it the case? I've seen a > couple workloads like that in the past and profilers suggested that things > that usually do not matter were bottleneck like creating scorers or > deciding whether a query should be cached. But trying to fix it didn't > really help as there are lots of things that we need to do to decide how to > run a query that run in O(num_segments * num_clauses) > > I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch > is 0 or 1. DisjunctionSumScorer should be used instead for such values of > minShouldMatch? > > Le jeu. 7 juin 2018 à 19:38, Michael McCandless <luc...@mikemccandless.com> > a écrit : > >> Doesn't BQ rewrite itself if it has only one clause? >> >> Or maybe if there were more than one clause, and then all but one of them >> had null scorers (on SHOULD clauses) you could wind up in that state? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <david.w.smi...@gmail.com> >> wrote: >> >>> I'm working with some folks who did some profiling and noticed that >>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says >>> only to call it if necessary. Unfortunately, a BooleanQuery is going to >>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the >>> tree cares what the cost is. I'm not sure if that's a perf bug or not; >>> it's hard to tell. >>> >>> The expensive part of cost() for Boolean2ScorerSupplier is over in >>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even >>> if trivially numScorers == 1. That's a weird case... why do we even need a >>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned >>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as >>> an optimization? I could file an issue. >>> >>> ~ David >>> -- >>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >>> http://www.solrenterprisesearchserver.com >>> >> >> -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com