sgup432 commented on issue #16049:
URL: https://github.com/apache/lucene/issues/16049#issuecomment-4436146557
So I did verify this via micro-benchmark, details below:
### Micro benchmark Details
**Setup:**
- 10M docs (single segment, force-merged), each with a category keyword
field
- Query: `bool { must: match_all, must_not: [type_a, type_b, ..., type_k]
} (11 MUST_NOT terms)`
- Values assigned round-robin across 11 categories (type_a through
type_k), creating maximally interleaved posting lists
- JMH: 1 fork, 4GB heap, 3 warmup × 5s, 5 measurement × 5s
**Before fix:**
```
Benchmark (numDocs) (numMustNotTerms)
Mode Cnt Score Error Units
MustNotInterleavedBenchmark.searchMustNot 10000000 11
avgt 3 94.894 ± 22.269 ms/op
```
**JFR profile (before fix) — top leaf frames:**
```
325 samples DisiPriorityQueueN.downHeap()
107 samples DisiPriorityQueueN.downHeap()
44 samples DisjunctionDISIApproximation.docIDRunEnd()
37 samples DisiPriorityQueueN.topList()
36 samples DisjunctionDISIApproximation.docIDRunEnd()
35 samples DisjunctionDISIApproximation.topList()
35 samples DisiPriorityQueueN.topList()
```
Confirms docIDRunEnd() → topList() → computeTopList() path is a
significant contributor on top of the heap operations.
**After fix (skip docIDRunEnd() for disjunctions with >2 sub-iterators):**
```
Benchmark (numDocs) (numMustNotTerms)
Mode Cnt Score Error Units
MustNotInterleavedBenchmark.searchMustNot 10000000 11
avgt 3 66.113 ± 9.443 ms/op
```
**~30% improvement**
**JFR profile (after fix):**
```
206 samples DisiPriorityQueueN.downHeap()
191 samples DisiPriorityQueueN.downHeap()
88 samples Lucene104PostingsReader$BlockPostingsEnum.nextDoc()
86 samples DisjunctionDISIApproximation.nextDoc()
```
topList()/computeTopList()/docIDRunEnd() completely eliminated from the hot
path.
Remaining cost is the expected O(log K) heap maintenance per excluded doc —
same as Lucene 9 behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]