shimpeko commented on PR #15659:
URL: https://github.com/apache/lucene/pull/15659#issuecomment-3902363180

   @jpountz Thanks for the feedback. I’ve dropped the idea of switching from 
`DisjunctionMaxBulkScorer` to `DisjunctionMaxScorer` when all child bulk 
scorers are `DefaultBulkScorer`. That approach has been reverted.
   
   Instead, this change makes `DisjunctionMaxBulkScorer` start with a window 
size of **1** and grow it **exponentially**.
   
   As you noted in 
[https://github.com/apache/lucene/pull/15659#issuecomment-3841311138](https://github.com/apache/lucene/pull/15659#issuecomment-3841311138),
 _“DisjunctionMaxBulkScorer tracks the min competitive score and passes it to 
its sub clauses”_. However, propagating the min competitive score only at fixed 
4096-doc boundaries is sometimes too slow for queries such as "constant_score + 
top-N (with small N) + high match rate". In these cases, the delayed 
propagation causes a regression compared to `DisjunctionMaxScorer`, which 
propagates the min score on a per-doc basis.
   
   The new approach addresses that problem by exponentially increasing the 
window size from 1 up to 4096, ensuring that min competitive score propagation 
happens early enough, while still preserving the throughput advantages of bulk 
scoring. I believe this approach is more robust than relying on scorer-type 
checks such as `if (bs instanceof Weight.DefaultBulkScorer dbs)`.
   
   Below are the benchmark result, which demonstrates the improvement:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           PKLookup      240.92     (16.3%)      242.22     
(15.1%)    0.5% ( -26% -   38%) 0.732
                   DismaxOrHighHigh      264.57     (15.7%)      270.59     
(15.1%)    2.3% ( -24% -   39%) 0.141
                    DismaxOrHighMed      214.97     (16.1%)      219.96     
(15.0%)    2.3% ( -24% -   39%) 0.135
                 FilteredDismaxTerm      130.87     (13.0%)      134.62     
(11.0%)    2.9% ( -18% -   30%) 0.018
           FilteredDismaxOrHighHigh       58.86     (16.1%)       60.69     
(11.9%)    3.1% ( -21% -   37%) 0.028
            FilteredDismaxOrHighMed      141.60     (14.2%)      147.28     
(12.0%)    4.0% ( -19% -   35%) 0.002
                         DismaxTerm      794.99     (16.2%)      831.28     
(14.1%)    4.6% ( -22% -   41%) 0.003
                           CSTerm20      560.08     (26.9%)      596.16     
(22.1%)    6.4% ( -33% -   75%) 0.009
                      DisMaxCsTerm1     1400.32     (14.2%)     1974.02     
(14.6%)   41.0% (  10% -   81%) 0.000
                     DisMaxCSTerm20      205.76     (18.9%)      322.79     
(29.3%)   56.9% (   7% -  129%) 0.000
   ```
   
   baseline: 16f83c98c4ff0f17568b44121f5cdbfc42cf5fc3
   my_modified_version: fe86cebb57300d858012fe983fbe8461949aca35
   --iterations=200 --warmups=50
   
   As mentioned in 
[https://github.com/apache/lucene/pull/15659#issuecomment-3853027797](https://github.com/apache/lucene/pull/15659#issuecomment-3853027797),
 I’ve also added test cases that exercise a *dismax + constant_score* structure 
in luceneutil: 
[https://github.com/shimpeko/luceneutil/pull/1/changes](https://github.com/shimpeko/luceneutil/pull/1/changes).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to