Re: ScorerSupplier and cost() avoidance

David Smiley Thu, 07 Jun 2018 13:35:01 -0700

I'll have to look further with the team on what these queries look like in
terms of relative cheapness and throughput.


RE MinShouldMatchSumScorer.cost() (static method called by
Boolean2ScorerSupplier.computeCost) --  it's pretty easy to see that this
is called with minShouldMatch==0 (or 1).  Set a conditional breakpoint in
your IDE and run TestBooleanMinShouldMatch.testRandomQueries

On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand <[email protected]> wrote:

> I suspect this could only show up as a bottleneck if they run very cheap
> queries (low cost) at a very high throughput? Is it the case? I've seen a
> couple workloads like that in the past and profilers suggested that things
> that usually do not matter were bottleneck like creating scorers or
> deciding whether a query should be cached. But trying to fix it didn't
> really help as there are lots of things that we need to do to decide how to
> run a query that run in O(num_segments * num_clauses)
>
> I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch
> is 0 or 1. DisjunctionSumScorer should be used instead for such values of
> minShouldMatch?
>
> Le jeu. 7 juin 2018 à 19:38, Michael McCandless <[email protected]>
> a écrit :
>
>> Doesn't BQ rewrite itself if it has only one clause?
>>
>> Or maybe if there were more than one clause, and then all but one of them
>> had null scorers (on SHOULD clauses) you could wind up in that state?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <[email protected]>
>> wrote:
>>
>>> I'm working with some folks who did some profiling and noticed that
>>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
>>> only to call it if necessary. Unfortunately, a BooleanQuery is going to
>>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
>>> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
>>> it's hard to tell.
>>>
>>> The expensive part of cost() for Boolean2ScorerSupplier is over in
>>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
>>> if trivially numScorers == 1.  That's a weird case... why do we even need a
>>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
>>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
>>> an optimization?  I could file an issue.
>>>
>>> ~ David
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ScorerSupplier and cost() avoidance

Reply via email to