prudhvigodithi opened a new pull request, #15446: URL: https://github.com/apache/lucene/pull/15446
### Description This PR optimizes `PointRangeQuery` to efficiently support intra-segment concurrent search by implementing segment-level `DocIdSet` caching. When a large segment is split into multiple partitions for parallel processing, all partitions now share a single BKD tree traversal result instead of each partition performing redundant traversals. The solution was derived as part of discussion from this PR https://github.com/apache/lucene/pull/15383. Related issue for `PointRangeQuery` with https://github.com/apache/lucene/issues/13745 intra-segment. ### Problem With intra-segment concurrency enabled, a single segment can be split into multiple partitions, each processed by a different thread. In the current implementation, each partition independently traverses the BKD tree and builds its own DocIdSet, resulting in Query latency https://github.com/apache/lucene/pull/13542#issuecomment-2332114836 and redundant/duplicate BKD crawl. ### Solution Implement a segment level cache that ensures the BKD tree is traversed only once per segment, with the resulting `DocIdSet` shared across all partitions: 1. **SegmentDocIdSetSupplier**: A new helper class that lazily builds and caches the `DocIdSet` for an entire segment. 2. **Segment-level cache**: A `ConcurrentHashMap<LeafReaderContext, SegmentDocIdSetSupplier>` in the `Weight` that ensures all partitions of the same segment share the same supplier. 3. **PartitionScorerSupplier**: A new `ScorerSupplier` implementation that references the shared cache and filters results to the partition's doc ID range. 4. **PartitionFilteredDocIdSetIterator**: A lightweight iterator wrapper that filters the shared full-segment `DocIdSet` to only return docs within the partition's range. ### Performance Impact: Seen good improvement with `IntNRQ` ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Respell 21.82 (11.8%) 19.72 (7.4%) -9.6% ( -25% - 10%) 0.123 BrowseDayOfYearSSDVFacets 3.14 (18.6%) 3.01 (12.2%) -4.0% ( -29% - 32%) 0.688 HighTermTitleBDVSort 10.51 (3.7%) 10.11 (4.5%) -3.7% ( -11% - 4%) 0.152 MedIntervalsOrdered 36.53 (7.0%) 35.68 (6.3%) -2.3% ( -14% - 11%) 0.576 OrNotHighLow 435.41 (3.5%) 425.81 (1.3%) -2.2% ( -6% - 2%) 0.192 AndHighMedDayTaxoFacets 63.83 (1.8%) 62.45 (4.0%) -2.2% ( -7% - 3%) 0.271 HighTermMonthSort 207.85 (4.6%) 203.35 (4.0%) -2.2% ( -10% - 6%) 0.431 OrNotHighHigh 36.19 (16.2%) 35.57 (11.9%) -1.7% ( -25% - 31%) 0.849 BrowseMonthSSDVFacets 3.12 (9.4%) 3.07 (13.6%) -1.5% ( -22% - 23%) 0.839 HighPhrase 22.87 (2.6%) 22.58 (2.6%) -1.3% ( -6% - 3%) 0.437 HighTermDayOfYearSort 25.58 (2.4%) 25.26 (3.1%) -1.3% ( -6% - 4%) 0.477 LowSpanNear 17.70 (3.0%) 17.53 (3.1%) -1.0% ( -6% - 5%) 0.617 HighSloppyPhrase 9.39 (2.8%) 9.32 (2.8%) -0.7% ( -6% - 4%) 0.674 HighSpanNear 14.31 (3.7%) 14.21 (0.7%) -0.7% ( -4% - 3%) 0.692 OrHighNotLow 148.21 (12.2%) 147.26 (11.5%) -0.6% ( -21% - 26%) 0.932 TermDTSort 26.59 (4.0%) 26.43 (2.2%) -0.6% ( -6% - 5%) 0.760 OrHighNotHigh 50.27 (11.8%) 50.05 (12.1%) -0.4% ( -21% - 26%) 0.955 OrHighMed 111.27 (8.8%) 110.89 (8.6%) -0.3% ( -16% - 18%) 0.950 AndHighMed 210.73 (2.8%) 210.06 (1.9%) -0.3% ( -4% - 4%) 0.834 Wildcard 25.46 (1.5%) 25.40 (2.3%) -0.3% ( -3% - 3%) 0.835 OrNotHighMed 60.00 (14.7%) 59.88 (12.5%) -0.2% ( -23% - 31%) 0.981 BrowseDateSSDVFacets 0.53 (16.0%) 0.53 (18.6%) -0.1% ( -29% - 41%) 0.994 HighTerm 242.89 (7.4%) 243.56 (10.1%) 0.3% ( -16% - 19%) 0.961 range 2796.48 (7.4%) 2805.50 (3.1%) 0.3% ( -9% - 11%) 0.928 BrowseDayOfYearTaxoFacets 2.09 (7.3%) 2.09 (11.5%) 0.4% ( -17% - 20%) 0.952 OrHighHigh 35.14 (9.7%) 35.32 (12.6%) 0.5% ( -19% - 25%) 0.943 MedSloppyPhrase 11.84 (1.4%) 11.91 (3.9%) 0.6% ( -4% - 6%) 0.746 Prefix3 33.61 (3.1%) 33.84 (2.8%) 0.7% ( -5% - 6%) 0.717 LowIntervalsOrdered 99.26 (2.7%) 99.96 (3.8%) 0.7% ( -5% - 7%) 0.737 MedTermDayTaxoFacets 16.22 (6.0%) 16.35 (8.0%) 0.8% ( -12% - 15%) 0.859 HighIntervalsOrdered 2.98 (12.3%) 3.01 (8.0%) 0.8% ( -17% - 24%) 0.897 IntSet 140.75 (4.4%) 142.36 (5.8%) 1.1% ( -8% - 11%) 0.726 AndHighHighDayTaxoFacets 12.74 (5.4%) 12.90 (3.5%) 1.3% ( -7% - 10%) 0.647 HighTermTitleSort 14.03 (1.3%) 14.22 (2.6%) 1.3% ( -2% - 5%) 0.295 BrowseRandomLabelTaxoFacets 1.72 (5.8%) 1.75 (4.9%) 1.4% ( -8% - 12%) 0.688 OrHighMedDayTaxoFacets 1.15 (3.8%) 1.17 (5.0%) 1.5% ( -6% - 10%) 0.580 MedPhrase 58.94 (4.5%) 59.93 (4.7%) 1.7% ( -7% - 11%) 0.561 Fuzzy1 34.72 (7.2%) 35.33 (5.7%) 1.8% ( -10% - 15%) 0.670 AndHighLow 531.18 (1.7%) 541.24 (6.5%) 1.9% ( -6% - 10%) 0.527 BrowseMonthTaxoFacets 2.17 (8.9%) 2.21 (12.3%) 2.0% ( -17% - 25%) 0.772 LowPhrase 12.05 (3.8%) 12.34 (3.4%) 2.4% ( -4% - 9%) 0.294 LowSloppyPhrase 16.99 (2.2%) 17.40 (3.2%) 2.4% ( -2% - 7%) 0.162 LowTerm 459.98 (16.0%) 472.15 (17.4%) 2.6% ( -26% - 42%) 0.803 BrowseRandomLabelSSDVFacets 2.12 (8.3%) 2.18 (13.3%) 2.9% ( -17% - 26%) 0.678 MedSpanNear 4.41 (5.3%) 4.54 (7.2%) 3.1% ( -8% - 16%) 0.434 AndHighHigh 48.99 (9.9%) 50.53 (11.9%) 3.1% ( -17% - 27%) 0.650 OrHighLow 342.27 (6.5%) 353.27 (3.7%) 3.2% ( -6% - 14%) 0.334 OrHighNotMed 117.56 (12.8%) 122.42 (11.0%) 4.1% ( -17% - 31%) 0.582 MedTerm 273.79 (15.1%) 285.16 (13.9%) 4.2% ( -21% - 39%) 0.652 Fuzzy2 38.07 (9.0%) 39.69 (11.0%) 4.2% ( -14% - 26%) 0.502 PKLookup 139.01 (11.7%) 146.39 (7.1%) 5.3% ( -12% - 27%) 0.386 BrowseDateTaxoFacets 2.02 (5.9%) 2.16 (11.7%) 7.1% ( -9% - 26%) 0.228 IntNRQ 12.30 (3.8%) 30.18 (8.2%) 145.3% ( 128% - 163%) 0.000 ``` ### Related Issues - https://github.com/apache/lucene/issues/13745 - ~https://github.com/apache/lucene/issues/14485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
