costin commented on PR #16146:
URL: https://github.com/apache/lucene/pull/16146#issuecomment-4602395093
> Shouldn't the query cache be going through
ConstantScoreScorerSupplier/ConstantScoreBulkScorer now?
I've reverted the instanceof check and kept `DefaultBulkScorer` generic and
routed the relevant no-score fallback through `ConstantScoreBulkScorer`.
This avoids the earlier `DefaultBulkScorer` special-case while still
enabling batching for cached bitset filters.
I've moved the gating as part of `BitSetConjunctionDISI.intoBitSet()` which
decides locally whether the bulk bitset path is worth using: sparse leads fall
back to per-doc iteration, while denser leads use the bulk masking path.
I've put in place two gates:
- Gate 1: ConstantScoreBulkScorer (windowed): conjunction match ≥ 8K docs
- Gate 2: BitSetConjunctionDISI.intoBitSet: lead match ≥ 4k docs
The goal was to avoid divergence between the conjunction and lead cost (e.g.
conjuction has 10K matching docs but BSCDISI has very sparse iterator that is
suitable to per-doc)
Updated the benchmark results - here we're testing FILTER + SHOULD + SHOULD
(vs FILTER+FILTER in the previous one) to activate the
filteredOptionalBulkScorer() path and added test for the query shape where
this applies:
cached = filter pre-warmed in LRUQueryCache as FixedBitSet
uncached = filter evaluated from postings each time
Query: `FILTER(cached) + SHOULD(a) + SHOULD(b), minShouldMatch=1,
COMPLETE_NO_SCORES`
Benchmark running on AMD EPYC c5a.2xlarge, AVX2, JDK 25
| lead | filter | baseline (ops/s) | cached (ops/s) | cached ratio |
baseline (ops/s) | uncached (ops/s) | uncached ratio |
|----|------|------------------:|---------------:|-------------:|------------------:|-----------------:|---------------:|
| 0.001 | 0.10 | 69,051 | 68,593 | ~1.0x | 7,468 | 8,050 | ~1.0x |
| 0.001 | 0.50 | 59,101 | 59,414 | ~1.0x | 20,071 | 19,317 | ~1.0x |
| 0.005 | 0.10 | 15,981 | 15,956 | ~1.0x | 3,874 | 3,524 | ~1.0x |
| 0.005 | 0.50 | 10,196 | 10,465 | ~1.0x | 4,738 | 4,435 | ~1.0x |
| 0.01 | 0.10 | 6,036 | 10,938 | **1.81x** | 2,535 | 2,452 | ~1.0x |
| 0.01 | 0.50 | 4,194 | 13,425 | **3.20x** | 2,888 | 2,879 | ~1.0x |
| 0.02 | 0.10 | 2,735 | 8,361 | **3.06x** | 1,542 | 1,492 | ~1.0x |
| 0.02 | 0.50 | 1,994 | 9,600 | **4.81x** | 1,578 | 1,569 | ~1.0x |
| 0.03 | 0.10 | 1,812 | 6,622 | **3.65x** | 1,114 | 1,151 | ~1.0x |
| 0.03 | 0.50 | 1,334 | 8,259 | **6.19x** | 1,097 | 1,101 | ~1.0x |
| 0.10 | 0.10 | 546 | 3,464 | **6.35x** | 447 | 2,010 | **4.50x** |
| 0.10 | 0.50 | 422 | 3,427 | **8.12x** | 375 | 2,490 | **6.64x** |
| 0.50 | 0.10 | 382 | 4,847 | **12.68x** | 380 | 2,293 | **6.03x** |
| 0.50 | 0.50 | 79 | 4,889 | **61.91x** | 85 | 3,281 | **38.53x** |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]