Re: [PR] Optimize cached-filter conjunction batching [lucene]

via GitHub Tue, 02 Jun 2026 05:32:47 -0700


costin commented on PR #16146:
URL: https://github.com/apache/lucene/pull/16146#issuecomment-4602395093


   > Shouldn't the query cache be going through 
ConstantScoreScorerSupplier/ConstantScoreBulkScorer now?
   
   I've reverted the instanceof check and kept `DefaultBulkScorer` generic and 
routed the relevant no-score fallback through `ConstantScoreBulkScorer`.
   This avoids the earlier `DefaultBulkScorer` special-case while still 
enabling batching for cached bitset filters.
   
   I've moved the gating as part of `BitSetConjunctionDISI.intoBitSet()` which 
decides locally whether the bulk bitset path is worth using: sparse leads fall 
back to per-doc iteration, while denser leads use the bulk masking path.
   I've put in place two gates:
     - Gate 1:  ConstantScoreBulkScorer (windowed): conjunction match ≥ 8K docs
     - Gate 2: BitSetConjunctionDISI.intoBitSet: lead match ≥ 4k docs
   
   The goal was to avoid divergence between the conjunction and lead cost (e.g. 
conjuction has 10K matching docs but BSCDISI has very sparse iterator that is 
suitable to per-doc)
   
   Updated the benchmark results - here we're testing FILTER + SHOULD + SHOULD 
(vs FILTER+FILTER in the previous one) to activate the 
filteredOptionalBulkScorer() path  and added test for the query shape where 
this applies:
   
   cached = filter pre-warmed in LRUQueryCache as FixedBitSet
   uncached = filter evaluated from postings each time
   Query: `FILTER(cached) + SHOULD(a) + SHOULD(b), minShouldMatch=1, 
COMPLETE_NO_SCORES`
   
   Benchmark running on AMD EPYC c5a.2xlarge, AVX2, JDK 25
   
    | lead | filter | baseline (ops/s) | cached (ops/s) | cached ratio | 
baseline (ops/s) | uncached (ops/s) | uncached ratio |
     
|----|------|------------------:|---------------:|-------------:|------------------:|-----------------:|---------------:|
     | 0.001 | 0.10 | 69,051 | 68,593 | ~1.0x | 7,468 | 8,050 | ~1.0x |
     | 0.001 | 0.50 | 59,101 | 59,414 | ~1.0x | 20,071 | 19,317 | ~1.0x |
     | 0.005 | 0.10 | 15,981 | 15,956 | ~1.0x | 3,874 | 3,524 | ~1.0x |
     | 0.005 | 0.50 | 10,196 | 10,465 | ~1.0x | 4,738 | 4,435 | ~1.0x |
     | 0.01 | 0.10 | 6,036 | 10,938 | **1.81x** | 2,535 | 2,452 | ~1.0x |
     | 0.01 | 0.50 | 4,194 | 13,425 | **3.20x** | 2,888 | 2,879 | ~1.0x |
     | 0.02 | 0.10 | 2,735 | 8,361 | **3.06x** | 1,542 | 1,492 | ~1.0x |
     | 0.02 | 0.50 | 1,994 | 9,600 | **4.81x** | 1,578 | 1,569 | ~1.0x |
     | 0.03 | 0.10 | 1,812 | 6,622 | **3.65x** | 1,114 | 1,151 | ~1.0x |
     | 0.03 | 0.50 | 1,334 | 8,259 | **6.19x** | 1,097 | 1,101 | ~1.0x |
     | 0.10 | 0.10 | 546 | 3,464 | **6.35x** | 447 | 2,010 | **4.50x** |
     | 0.10 | 0.50 | 422 | 3,427 | **8.12x** | 375 | 2,490 | **6.64x** |
     | 0.50 | 0.10 | 382 | 4,847 | **12.68x** | 380 | 2,293 | **6.03x** |
     | 0.50 | 0.50 | 79 | 4,889 | **61.91x** | 85 | 3,281 | **38.53x** |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize cached-filter conjunction batching [lucene]

Reply via email to