This suggests that BlockMaxConjunctionBulkScorer has a similar issue, I'll look into it too.
On Thu, Sep 19, 2024 at 2:48 AM Rui Wu <rui...@mongodb.com> wrote: > Hi Adrien, > > Thanks for your help and putting up a fix! > > Another experiment I did without your PR: if the 12 SHOULD clauses are > changed to 12 MUST clauses, the problem is the same: it collects 3.6M docs > on Lucene911 but 1001 docs on Lucene97. Does this data point align with how > MaxScoreBulkScorer works? > > Thanks! > > On Wed, Sep 18, 2024 at 1:51 AM Adrien Grand <jpou...@gmail.com> wrote: > >> Thank you, this last comment was helpful and helped me understand the >> problem. I opened a PR at https://github.com/apache/lucene/pull/13800. >> >> On Tue, Sep 17, 2024 at 7:45 PM Rui Wu <rui...@mongodb.com> wrote: >> >>> Another information is that, in Lucene97, this query (12 SHOULD clauses) >>> collected 1001 results; while in Lucene911, this query (12 SHOULD clauses) >>> collected all docs (3.6M collect count). >>> >>> In Lucene911, if the query has only one SHOULD clause, it collects 1001 >>> results. If the query has multiple clauses, it collects 3.6M results. >>> >>> On Tue, Sep 17, 2024 at 9:09 AM Rui Wu <rui...@mongodb.com> wrote: >>> >>>> This query latency increased from 14.65 to 20.90ms. >>>> >>>> We use the `TopScoreDocCollector.createSharedManager(/*batchSize*/ 101, >>>> /*searchAfterFieldDoc*/ null, /*hitsThreshold*/ 1000); ` >>>> >>>> Thanks a lot! >>>> >>>> On Tue, Sep 17, 2024 at 6:45 AM Adrien Grand <jpou...@gmail.com> wrote: >>>> >>>>> Can you tell us how long this query used to take, and how long it >>>>> takes now? >>>>> Also are you using IndexSearcher's default total hit count threshold >>>>> of 1,000, or are you passing a custom value to >>>>> TopScoreDocCollectorManager? >>>>> >>>>> On Tue, Sep 17, 2024 at 10:14 AM Rui Wu <rui...@mongodb.com> wrote: >>>>> >>>>>> Hi Adrien, >>>>>> >>>>>> Thanks for looking into this! Here are more screenshots of the >>>>>> flamegraph. The original flamegraph HTMLs have stack traces from our app >>>>>> so >>>>>> I don't share it here. >>>>>> [image: Screenshot 2024-09-17 at 1.13.07 AM.png][image: Screenshot >>>>>> 2024-09-17 at 1.12.01 AM.png] >>>>>> >>>>>> On Tue, Sep 17, 2024 at 1:00 AM Adrien Grand <jpou...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello Rui, >>>>>>> >>>>>>> We actually released a change that should make MaxScoreBulkScorer >>>>>>> faster on dense disjunctions in 9.8: >>>>>>> https://github.com/apache/lucene/pull/12444. Your benchmark case is >>>>>>> quite specific though as all clauses match all docs and produce constant >>>>>>> scores, so I would expect the scorer to quickly realize that it can skip >>>>>>> all documents once it's scored the first k docs. This makes me wonder >>>>>>> if it >>>>>>> bottleneck on skipping blocks of documents rather than on scoring them. >>>>>>> Would you be able to share your whole flame graph, it looks like it may >>>>>>> be >>>>>>> truncated a the top? >>>>>>> >>>>>>> On Mon, Sep 16, 2024 at 10:01 PM Rui Wu <rui...@mongodb.com> wrote: >>>>>>> >>>>>>>> Correction: The index has 3.6 million documents. >>>>>>>> >>>>>>>> On Mon, Sep 16, 2024 at 1:00 PM Rui Wu <rui...@mongodb.com> wrote: >>>>>>>> >>>>>>>>> Dear experts, >>>>>>>>> >>>>>>>>> In our Mongodb Atlas Search performance regression test between >>>>>>>>> Lucene 9.7 and Lucene 9.11, we detect a 43% latency regression in this >>>>>>>>> query shape: >>>>>>>>> 12 SHOULD clause, and each clause matches all of the documents. >>>>>>>>> Each should clause is wrapped in ConstantScoreQuery. >>>>>>>>> >>>>>>>>> The index has 3.6 documents, and every document is identical: >>>>>>>>> Every document is {"path": ["1", "2", "3" ... "12"]} >>>>>>>>> The query shape is a BooleanQuery of SHOULD "1", SHOULD "2", ... >>>>>>>>> SHOULD "12". >>>>>>>>> >>>>>>>>> Our flamegraphs show that most of the time in search() is spent on >>>>>>>>> the MaxScoreBulkScorer class: >>>>>>>>> [image: image.png] >>>>>>>>> >>>>>>>>> We wonder if this extreme test case is expected to be slow on >>>>>>>>> MaxScoreBulkScorer? >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> Rui Wu >>>>>>>>> Lead Engineer, MongoDB >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Adrien >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Adrien >>>>> >>>> >> >> -- >> Adrien >> > -- Adrien