Re: [PR] Allow Collectors to re-order segments for non-exhaustive searches [lucene]

via GitHub Mon, 01 Dec 2025 02:02:47 -0800


romseygeek commented on PR #15436:
URL: https://github.com/apache/lucene/pull/15436#issuecomment-3595659295


   I added a couple of sorted MatchAll queries to `wikimedium.10M.tasks` and 
tested this out on an index sorted by `lastMod`.  In this case it basically 
doesn't make any difference at all:
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
           MatchAllDateTimeDescSort       29.30     (37.7%)       27.29     
(23.6%)   -6.9% ( -49% -   87%) 0.490
              HighTermDayOfYearSort       42.40     (10.8%)       40.19     
(10.7%)   -5.2% ( -24% -   18%) 0.126
               TermDateTimeDescSort      222.80      (4.0%)      219.28      
(4.4%)   -1.6% (  -9% -    7%) 0.236
               HighTermTitleBDVSort        6.88      (4.0%)        6.87      
(3.2%)   -0.1% (  -7% -    7%) 0.904
               MatchAllDateTimeSort        9.01     (11.3%)        9.04      
(9.6%)    0.3% ( -18% -   23%) 0.921
                           PKLookup      130.26      (2.3%)      130.92      
(2.2%)    0.5% (  -3% -    5%) 0.478
                         TermDTSort       52.50     (11.2%)       53.30     
(15.5%)    1.5% ( -22% -   31%) 0.721
                  HighTermMonthSort       37.38      (9.4%)       39.34      
(9.2%)    5.2% ( -12% -   26%) 0.074
   ```
   The `lastMod` values are fairly evenly distributed between segments, so 
segment sorting doesn't really have an effect.  I think a more interesting 
experiment would be with something like time series data where the input is 
naturally close to sorted and so the sort values in segments are mostly 
disjoint.  I'll see if I can mock something up and run these tests again.
   
   On the plus side, it seems that there isn't a noticeable penalty for doing 
this sorting, so the escape hatch may not be necessary.  But I want to make 
sure that there are actually existing benefits as well!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Allow Collectors to re-order segments for non-exhaustive searches [lucene]

Reply via email to