Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

Rui Wu Wed, 18 Sep 2024 17:48:43 -0700

Hi Adrien,

Thanks for your help and putting up a fix!


Another experiment I did without your PR:  if the 12 SHOULD clauses are
changed to 12 MUST clauses, the problem is the same: it collects 3.6M docs
on Lucene911 but 1001 docs on Lucene97. Does this data point align with how
MaxScoreBulkScorer works?

Thanks!

On Wed, Sep 18, 2024 at 1:51 AM Adrien Grand <[email protected]> wrote:

> Thank you, this last comment was helpful and helped me understand the
> problem. I opened a PR at https://github.com/apache/lucene/pull/13800.
>
> On Tue, Sep 17, 2024 at 7:45 PM Rui Wu <[email protected]> wrote:
>
>> Another information is that, in Lucene97, this query (12 SHOULD clauses)
>> collected 1001 results; while in Lucene911, this query (12 SHOULD clauses)
>> collected all docs (3.6M collect count).
>>
>> In Lucene911, if the query has only one SHOULD clause, it collects 1001
>> results. If the query has multiple clauses, it collects 3.6M results.
>>
>> On Tue, Sep 17, 2024 at 9:09 AM Rui Wu <[email protected]> wrote:
>>
>>> This query latency increased from 14.65 to 20.90ms.
>>>
>>> We use the `TopScoreDocCollector.createSharedManager(/*batchSize*/ 101,
>>> /*searchAfterFieldDoc*/ null, /*hitsThreshold*/ 1000); `
>>>
>>> Thanks a lot!
>>>
>>> On Tue, Sep 17, 2024 at 6:45 AM Adrien Grand <[email protected]> wrote:
>>>
>>>> Can you tell us how long this query used to take, and how long it
>>>> takes now?
>>>> Also are you using IndexSearcher's default total hit count threshold of
>>>> 1,000, or are you passing a custom value to TopScoreDocCollectorManager?
>>>>
>>>> On Tue, Sep 17, 2024 at 10:14 AM Rui Wu <[email protected]> wrote:
>>>>
>>>>> Hi Adrien,
>>>>>
>>>>> Thanks for looking into this! Here are more screenshots of the
>>>>> flamegraph. The original flamegraph HTMLs have stack traces from our app 
>>>>> so
>>>>> I don't share it here.
>>>>> [image: Screenshot 2024-09-17 at 1.13.07 AM.png][image: Screenshot
>>>>> 2024-09-17 at 1.12.01 AM.png]
>>>>>
>>>>> On Tue, Sep 17, 2024 at 1:00 AM Adrien Grand <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Rui,
>>>>>>
>>>>>> We actually released a change that should make MaxScoreBulkScorer
>>>>>> faster on dense disjunctions in 9.8:
>>>>>> https://github.com/apache/lucene/pull/12444. Your benchmark case is
>>>>>> quite specific though as all clauses match all docs and produce constant
>>>>>> scores, so I would expect the scorer to quickly realize that it can skip
>>>>>> all documents once it's scored the first k docs. This makes me wonder if 
>>>>>> it
>>>>>> bottleneck on skipping blocks of documents rather than on scoring them.
>>>>>> Would you be able to share your whole flame graph, it looks like it may 
>>>>>> be
>>>>>> truncated a the top?
>>>>>>
>>>>>> On Mon, Sep 16, 2024 at 10:01 PM Rui Wu <[email protected]> wrote:
>>>>>>
>>>>>>> Correction: The index has 3.6 million documents.
>>>>>>>
>>>>>>> On Mon, Sep 16, 2024 at 1:00 PM Rui Wu <[email protected]> wrote:
>>>>>>>
>>>>>>>> Dear experts,
>>>>>>>>
>>>>>>>> In our Mongodb Atlas Search performance regression test between
>>>>>>>> Lucene 9.7 and Lucene 9.11, we detect a 43% latency regression in this
>>>>>>>> query shape:
>>>>>>>> 12 SHOULD clause, and each clause matches all of the documents.
>>>>>>>> Each should clause is wrapped in ConstantScoreQuery.
>>>>>>>>
>>>>>>>> The index has 3.6 documents, and every document is identical: Every
>>>>>>>> document is {"path": ["1", "2", "3" ... "12"]}
>>>>>>>> The query shape is a BooleanQuery of SHOULD "1", SHOULD "2", ...
>>>>>>>> SHOULD "12".
>>>>>>>>
>>>>>>>> Our flamegraphs show that most of the time in search() is spent on
>>>>>>>> the MaxScoreBulkScorer class:
>>>>>>>> [image: image.png]
>>>>>>>>
>>>>>>>> We wonder if this extreme test case is expected to be slow on
>>>>>>>> MaxScoreBulkScorer?
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>> Rui Wu
>>>>>>>> Lead Engineer, MongoDB
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>
> --
> Adrien
>

Re: MaxScoreBulkScorer increased latency for a extreme test case (many SHOULD and each SHOULD clause matches all docs)

Reply via email to