romseygeek commented on PR #15436:
URL: https://github.com/apache/lucene/pull/15436#issuecomment-3595659295
I added a couple of sorted MatchAll queries to `wikimedium.10M.tasks` and
tested this out on an index sorted by `lastMod`. In this case it basically
doesn't make any difference at all:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
MatchAllDateTimeDescSort 29.30 (37.7%) 27.29
(23.6%) -6.9% ( -49% - 87%) 0.490
HighTermDayOfYearSort 42.40 (10.8%) 40.19
(10.7%) -5.2% ( -24% - 18%) 0.126
TermDateTimeDescSort 222.80 (4.0%) 219.28
(4.4%) -1.6% ( -9% - 7%) 0.236
HighTermTitleBDVSort 6.88 (4.0%) 6.87
(3.2%) -0.1% ( -7% - 7%) 0.904
MatchAllDateTimeSort 9.01 (11.3%) 9.04
(9.6%) 0.3% ( -18% - 23%) 0.921
PKLookup 130.26 (2.3%) 130.92
(2.2%) 0.5% ( -3% - 5%) 0.478
TermDTSort 52.50 (11.2%) 53.30
(15.5%) 1.5% ( -22% - 31%) 0.721
HighTermMonthSort 37.38 (9.4%) 39.34
(9.2%) 5.2% ( -12% - 26%) 0.074
```
The `lastMod` values are fairly evenly distributed between segments, so
segment sorting doesn't really have an effect. I think a more interesting
experiment would be with something like time series data where the input is
naturally close to sorted and so the sort values in segments are mostly
disjoint. I'll see if I can mock something up and run these tests again.
On the plus side, it seems that there isn't a noticeable penalty for doing
this sorting, so the escape hatch may not be necessary. But I want to make
sure that there are actually existing benefits as well!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]