shimpeko commented on PR #15659: URL: https://github.com/apache/lucene/pull/15659#issuecomment-3902363180
@jpountz Thanks for the feedback. I’ve dropped the idea of switching from `DisjunctionMaxBulkScorer` to `DisjunctionMaxScorer` when all child bulk scorers are `DefaultBulkScorer`. That approach has been reverted. Instead, this change makes `DisjunctionMaxBulkScorer` start with a window size of **1** and grow it **exponentially**. As you noted in [https://github.com/apache/lucene/pull/15659#issuecomment-3841311138](https://github.com/apache/lucene/pull/15659#issuecomment-3841311138), _“DisjunctionMaxBulkScorer tracks the min competitive score and passes it to its sub clauses”_. However, propagating the min competitive score only at fixed 4096-doc boundaries is sometimes too slow for queries such as "constant_score + top-N (with small N) + high match rate". In these cases, the delayed propagation causes a regression compared to `DisjunctionMaxScorer`, which propagates the min score on a per-doc basis. The new approach addresses that problem by exponentially increasing the window size from 1 up to 4096, ensuring that min competitive score propagation happens early enough, while still preserving the throughput advantages of bulk scoring. I believe this approach is more robust than relying on scorer-type checks such as `if (bs instanceof Weight.DefaultBulkScorer dbs)`. Below are the benchmark result, which demonstrates the improvement: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 240.92 (16.3%) 242.22 (15.1%) 0.5% ( -26% - 38%) 0.732 DismaxOrHighHigh 264.57 (15.7%) 270.59 (15.1%) 2.3% ( -24% - 39%) 0.141 DismaxOrHighMed 214.97 (16.1%) 219.96 (15.0%) 2.3% ( -24% - 39%) 0.135 FilteredDismaxTerm 130.87 (13.0%) 134.62 (11.0%) 2.9% ( -18% - 30%) 0.018 FilteredDismaxOrHighHigh 58.86 (16.1%) 60.69 (11.9%) 3.1% ( -21% - 37%) 0.028 FilteredDismaxOrHighMed 141.60 (14.2%) 147.28 (12.0%) 4.0% ( -19% - 35%) 0.002 DismaxTerm 794.99 (16.2%) 831.28 (14.1%) 4.6% ( -22% - 41%) 0.003 CSTerm20 560.08 (26.9%) 596.16 (22.1%) 6.4% ( -33% - 75%) 0.009 DisMaxCsTerm1 1400.32 (14.2%) 1974.02 (14.6%) 41.0% ( 10% - 81%) 0.000 DisMaxCSTerm20 205.76 (18.9%) 322.79 (29.3%) 56.9% ( 7% - 129%) 0.000 ``` baseline: 16f83c98c4ff0f17568b44121f5cdbfc42cf5fc3 my_modified_version: fe86cebb57300d858012fe983fbe8461949aca35 --iterations=200 --warmups=50 As mentioned in [https://github.com/apache/lucene/pull/15659#issuecomment-3853027797](https://github.com/apache/lucene/pull/15659#issuecomment-3853027797), I’ve also added test cases that exercise a *dismax + constant_score* structure in luceneutil: [https://github.com/shimpeko/luceneutil/pull/1/changes](https://github.com/shimpeko/luceneutil/pull/1/changes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
