Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

Adrien Grand Fri, 20 Sep 2024 02:30:53 -0700

This suggests that BlockMaxConjunctionBulkScorer has a similar issue, I'll
look into it too.


On Thu, Sep 19, 2024 at 2:48 AM Rui Wu <rui...@mongodb.com> wrote:

> Hi Adrien,
>
> Thanks for your help and putting up a fix!
>
> Another experiment I did without your PR:  if the 12 SHOULD clauses are
> changed to 12 MUST clauses, the problem is the same: it collects 3.6M docs
> on Lucene911 but 1001 docs on Lucene97. Does this data point align with how
> MaxScoreBulkScorer works?
>
> Thanks!
>
> On Wed, Sep 18, 2024 at 1:51 AM Adrien Grand <jpou...@gmail.com> wrote:
>
>> Thank you, this last comment was helpful and helped me understand the
>> problem. I opened a PR at https://github.com/apache/lucene/pull/13800.
>>
>> On Tue, Sep 17, 2024 at 7:45 PM Rui Wu <rui...@mongodb.com> wrote:
>>
>>> Another information is that, in Lucene97, this query (12 SHOULD clauses)
>>> collected 1001 results; while in Lucene911, this query (12 SHOULD clauses)
>>> collected all docs (3.6M collect count).
>>>
>>> In Lucene911, if the query has only one SHOULD clause, it collects 1001
>>> results. If the query has multiple clauses, it collects 3.6M results.
>>>
>>> On Tue, Sep 17, 2024 at 9:09 AM Rui Wu <rui...@mongodb.com> wrote:
>>>
>>>> This query latency increased from 14.65 to 20.90ms.
>>>>
>>>> We use the `TopScoreDocCollector.createSharedManager(/*batchSize*/ 101,
>>>> /*searchAfterFieldDoc*/ null, /*hitsThreshold*/ 1000); `
>>>>
>>>> Thanks a lot!
>>>>
>>>> On Tue, Sep 17, 2024 at 6:45 AM Adrien Grand <jpou...@gmail.com> wrote:
>>>>
>>>>> Can you tell us how long this query used to take, and how long it
>>>>> takes now?
>>>>> Also are you using IndexSearcher's default total hit count threshold
>>>>> of 1,000, or are you passing a custom value to 
>>>>> TopScoreDocCollectorManager?
>>>>>
>>>>> On Tue, Sep 17, 2024 at 10:14 AM Rui Wu <rui...@mongodb.com> wrote:
>>>>>
>>>>>> Hi Adrien,
>>>>>>
>>>>>> Thanks for looking into this! Here are more screenshots of the
>>>>>> flamegraph. The original flamegraph HTMLs have stack traces from our app 
>>>>>> so
>>>>>> I don't share it here.
>>>>>> [image: Screenshot 2024-09-17 at 1.13.07 AM.png][image: Screenshot
>>>>>> 2024-09-17 at 1.12.01 AM.png]
>>>>>>
>>>>>> On Tue, Sep 17, 2024 at 1:00 AM Adrien Grand <jpou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Rui,
>>>>>>>
>>>>>>> We actually released a change that should make MaxScoreBulkScorer
>>>>>>> faster on dense disjunctions in 9.8:
>>>>>>> https://github.com/apache/lucene/pull/12444. Your benchmark case is
>>>>>>> quite specific though as all clauses match all docs and produce constant
>>>>>>> scores, so I would expect the scorer to quickly realize that it can skip
>>>>>>> all documents once it's scored the first k docs. This makes me wonder 
>>>>>>> if it
>>>>>>> bottleneck on skipping blocks of documents rather than on scoring them.
>>>>>>> Would you be able to share your whole flame graph, it looks like it may 
>>>>>>> be
>>>>>>> truncated a the top?
>>>>>>>
>>>>>>> On Mon, Sep 16, 2024 at 10:01 PM Rui Wu <rui...@mongodb.com> wrote:
>>>>>>>
>>>>>>>> Correction: The index has 3.6 million documents.
>>>>>>>>
>>>>>>>> On Mon, Sep 16, 2024 at 1:00 PM Rui Wu <rui...@mongodb.com> wrote:
>>>>>>>>
>>>>>>>>> Dear experts,
>>>>>>>>>
>>>>>>>>> In our Mongodb Atlas Search performance regression test between
>>>>>>>>> Lucene 9.7 and Lucene 9.11, we detect a 43% latency regression in this
>>>>>>>>> query shape:
>>>>>>>>> 12 SHOULD clause, and each clause matches all of the documents.
>>>>>>>>> Each should clause is wrapped in ConstantScoreQuery.
>>>>>>>>>
>>>>>>>>> The index has 3.6 documents, and every document is identical:
>>>>>>>>> Every document is {"path": ["1", "2", "3" ... "12"]}
>>>>>>>>> The query shape is a BooleanQuery of SHOULD "1", SHOULD "2", ...
>>>>>>>>> SHOULD "12".
>>>>>>>>>
>>>>>>>>> Our flamegraphs show that most of the time in search() is spent on
>>>>>>>>> the MaxScoreBulkScorer class:
>>>>>>>>> [image: image.png]
>>>>>>>>>
>>>>>>>>> We wonder if this extreme test case is expected to be slow on
>>>>>>>>> MaxScoreBulkScorer?
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>> Rui Wu
>>>>>>>>> Lead Engineer, MongoDB
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Adrien
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

Reply via email to