prudhvigodithi opened a new pull request, #15446:
URL: https://github.com/apache/lucene/pull/15446

   ### Description
   
   This PR optimizes `PointRangeQuery` to efficiently support intra-segment 
concurrent search by implementing segment-level `DocIdSet` caching. When a 
large segment is split into multiple partitions for parallel processing, all 
partitions now share a single BKD tree traversal result instead of each 
partition performing redundant traversals. The solution was derived as part of 
discussion from this PR https://github.com/apache/lucene/pull/15383. Related 
issue for `PointRangeQuery` with https://github.com/apache/lucene/issues/13745 
intra-segment.
   
   ### Problem
   
   With intra-segment concurrency enabled, a single segment can be split into 
multiple partitions, each processed by a different thread. In the current 
implementation, each partition independently traverses the BKD tree and builds 
its own DocIdSet, resulting in Query latency 
https://github.com/apache/lucene/pull/13542#issuecomment-2332114836  and 
redundant/duplicate BKD crawl.
   
   
   ### Solution
   
   Implement a segment level cache that ensures the BKD tree is traversed only 
once per segment, with the resulting `DocIdSet` shared across all partitions:
   
   1. **SegmentDocIdSetSupplier**: A new helper class that lazily builds and 
caches the `DocIdSet` for an entire segment.
   
   2. **Segment-level cache**: A `ConcurrentHashMap<LeafReaderContext, 
SegmentDocIdSetSupplier>` in the `Weight` that ensures all partitions of the 
same segment share the same supplier.
   
   3. **PartitionScorerSupplier**: A new `ScorerSupplier` implementation that 
references the shared cache and filters results to the partition's doc ID range.
   
   4. **PartitionFilteredDocIdSetIterator**: A lightweight iterator wrapper 
that filters the shared full-segment `DocIdSet` to only return docs within the 
partition's range.
   
   ### Performance Impact: Seen good improvement with `IntNRQ`
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                            Respell       21.82     (11.8%)       19.72      
(7.4%)   -9.6% ( -25% -   10%) 0.123
          BrowseDayOfYearSSDVFacets        3.14     (18.6%)        3.01     
(12.2%)   -4.0% ( -29% -   32%) 0.688
               HighTermTitleBDVSort       10.51      (3.7%)       10.11      
(4.5%)   -3.7% ( -11% -    4%) 0.152
                MedIntervalsOrdered       36.53      (7.0%)       35.68      
(6.3%)   -2.3% ( -14% -   11%) 0.576
                       OrNotHighLow      435.41      (3.5%)      425.81      
(1.3%)   -2.2% (  -6% -    2%) 0.192
            AndHighMedDayTaxoFacets       63.83      (1.8%)       62.45      
(4.0%)   -2.2% (  -7% -    3%) 0.271
                  HighTermMonthSort      207.85      (4.6%)      203.35      
(4.0%)   -2.2% ( -10% -    6%) 0.431
                      OrNotHighHigh       36.19     (16.2%)       35.57     
(11.9%)   -1.7% ( -25% -   31%) 0.849
              BrowseMonthSSDVFacets        3.12      (9.4%)        3.07     
(13.6%)   -1.5% ( -22% -   23%) 0.839
                         HighPhrase       22.87      (2.6%)       22.58      
(2.6%)   -1.3% (  -6% -    3%) 0.437
              HighTermDayOfYearSort       25.58      (2.4%)       25.26      
(3.1%)   -1.3% (  -6% -    4%) 0.477
                        LowSpanNear       17.70      (3.0%)       17.53      
(3.1%)   -1.0% (  -6% -    5%) 0.617
                   HighSloppyPhrase        9.39      (2.8%)        9.32      
(2.8%)   -0.7% (  -6% -    4%) 0.674
                       HighSpanNear       14.31      (3.7%)       14.21      
(0.7%)   -0.7% (  -4% -    3%) 0.692
                       OrHighNotLow      148.21     (12.2%)      147.26     
(11.5%)   -0.6% ( -21% -   26%) 0.932
                         TermDTSort       26.59      (4.0%)       26.43      
(2.2%)   -0.6% (  -6% -    5%) 0.760
                      OrHighNotHigh       50.27     (11.8%)       50.05     
(12.1%)   -0.4% ( -21% -   26%) 0.955
                          OrHighMed      111.27      (8.8%)      110.89      
(8.6%)   -0.3% ( -16% -   18%) 0.950
                         AndHighMed      210.73      (2.8%)      210.06      
(1.9%)   -0.3% (  -4% -    4%) 0.834
                           Wildcard       25.46      (1.5%)       25.40      
(2.3%)   -0.3% (  -3% -    3%) 0.835
                       OrNotHighMed       60.00     (14.7%)       59.88     
(12.5%)   -0.2% ( -23% -   31%) 0.981
               BrowseDateSSDVFacets        0.53     (16.0%)        0.53     
(18.6%)   -0.1% ( -29% -   41%) 0.994
                           HighTerm      242.89      (7.4%)      243.56     
(10.1%)    0.3% ( -16% -   19%) 0.961
                              range     2796.48      (7.4%)     2805.50      
(3.1%)    0.3% (  -9% -   11%) 0.928
          BrowseDayOfYearTaxoFacets        2.09      (7.3%)        2.09     
(11.5%)    0.4% ( -17% -   20%) 0.952
                         OrHighHigh       35.14      (9.7%)       35.32     
(12.6%)    0.5% ( -19% -   25%) 0.943
                    MedSloppyPhrase       11.84      (1.4%)       11.91      
(3.9%)    0.6% (  -4% -    6%) 0.746
                            Prefix3       33.61      (3.1%)       33.84      
(2.8%)    0.7% (  -5% -    6%) 0.717
                LowIntervalsOrdered       99.26      (2.7%)       99.96      
(3.8%)    0.7% (  -5% -    7%) 0.737
               MedTermDayTaxoFacets       16.22      (6.0%)       16.35      
(8.0%)    0.8% ( -12% -   15%) 0.859
               HighIntervalsOrdered        2.98     (12.3%)        3.01      
(8.0%)    0.8% ( -17% -   24%) 0.897
                             IntSet      140.75      (4.4%)      142.36      
(5.8%)    1.1% (  -8% -   11%) 0.726
           AndHighHighDayTaxoFacets       12.74      (5.4%)       12.90      
(3.5%)    1.3% (  -7% -   10%) 0.647
                  HighTermTitleSort       14.03      (1.3%)       14.22      
(2.6%)    1.3% (  -2% -    5%) 0.295
        BrowseRandomLabelTaxoFacets        1.72      (5.8%)        1.75      
(4.9%)    1.4% (  -8% -   12%) 0.688
             OrHighMedDayTaxoFacets        1.15      (3.8%)        1.17      
(5.0%)    1.5% (  -6% -   10%) 0.580
                          MedPhrase       58.94      (4.5%)       59.93      
(4.7%)    1.7% (  -7% -   11%) 0.561
                             Fuzzy1       34.72      (7.2%)       35.33      
(5.7%)    1.8% ( -10% -   15%) 0.670
                         AndHighLow      531.18      (1.7%)      541.24      
(6.5%)    1.9% (  -6% -   10%) 0.527
              BrowseMonthTaxoFacets        2.17      (8.9%)        2.21     
(12.3%)    2.0% ( -17% -   25%) 0.772
                          LowPhrase       12.05      (3.8%)       12.34      
(3.4%)    2.4% (  -4% -    9%) 0.294
                    LowSloppyPhrase       16.99      (2.2%)       17.40      
(3.2%)    2.4% (  -2% -    7%) 0.162
                            LowTerm      459.98     (16.0%)      472.15     
(17.4%)    2.6% ( -26% -   42%) 0.803
        BrowseRandomLabelSSDVFacets        2.12      (8.3%)        2.18     
(13.3%)    2.9% ( -17% -   26%) 0.678
                        MedSpanNear        4.41      (5.3%)        4.54      
(7.2%)    3.1% (  -8% -   16%) 0.434
                        AndHighHigh       48.99      (9.9%)       50.53     
(11.9%)    3.1% ( -17% -   27%) 0.650
                          OrHighLow      342.27      (6.5%)      353.27      
(3.7%)    3.2% (  -6% -   14%) 0.334
                       OrHighNotMed      117.56     (12.8%)      122.42     
(11.0%)    4.1% ( -17% -   31%) 0.582
                            MedTerm      273.79     (15.1%)      285.16     
(13.9%)    4.2% ( -21% -   39%) 0.652
                             Fuzzy2       38.07      (9.0%)       39.69     
(11.0%)    4.2% ( -14% -   26%) 0.502
                           PKLookup      139.01     (11.7%)      146.39      
(7.1%)    5.3% ( -12% -   27%) 0.386
               BrowseDateTaxoFacets        2.02      (5.9%)        2.16     
(11.7%)    7.1% (  -9% -   26%) 0.228
                             IntNRQ       12.30      (3.8%)       30.18      
(8.2%)  145.3% ( 128% -  163%) 0.000
   
   ```
   
   ### Related Issues
   
   - https://github.com/apache/lucene/issues/13745 
   - ~https://github.com/apache/lucene/issues/14485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to