Nikolay Khitrin created LUCENE-8432:
---------------------------------------
Summary: Stop calling comparator even if early termination is not
possible
Key: LUCENE-8432
URL: https://issues.apache.org/jira/browse/LUCENE-8432
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: 7.3
Reporter: Nikolay Khitrin
TopFieldCollector continues calling comparator.compareBottom even if result is
known in advance due to document order when trackMaxScore or trackTotalHits is
set.
Comparator call is not very cheap because it can involve DV read from disk and
all calls can be avoided after first non competitive segment document is
reached.
There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighTermMonthSort 226.04 (6.3%) 215.33 (4.3%)
-4.7% ( -14% - 6%)
LowTerm 933.27 (5.5%) 924.62 (4.2%)
-0.9% ( -10% - 9%)
OrNotHighLow 945.68 (5.7%) 939.12 (4.5%)
-0.7% ( -10% - 10%)
MedSpanNear 28.76 (1.4%) 28.61 (1.5%)
-0.5% ( -3% - 2%)
BrowseDayOfYearSSDVFacets 16.36 (5.0%) 16.29 (4.5%)
-0.4% ( -9% - 9%)
AndHighMed 112.30 (2.9%) 111.96 (1.6%)
-0.3% ( -4% - 4%)
LowSpanNear 12.42 (1.5%) 12.38 (1.6%)
-0.3% ( -3% - 2%)
HighSloppyPhrase 18.66 (3.9%) 18.62 (4.0%)
-0.2% ( -7% - 7%)
MedPhrase 219.40 (2.7%) 219.06 (2.7%)
-0.2% ( -5% - 5%)
OrNotHighMed 222.88 (3.2%) 222.63 (3.4%)
-0.1% ( -6% - 6%)
AndHighLow 521.59 (3.5%) 521.02 (4.5%)
-0.1% ( -7% - 8%)
MedSloppyPhrase 16.71 (4.7%) 16.70 (4.7%)
-0.0% ( -8% - 9%)
LowPhrase 15.58 (2.5%) 15.59 (2.9%)
0.0% ( -5% - 5%)
Respell 92.05 (2.4%) 92.19 (3.0%)
0.2% ( -5% - 5%)
HighSpanNear 17.03 (2.2%) 17.06 (2.1%)
0.2% ( -4% - 4%)
HighPhrase 37.85 (5.8%) 37.92 (5.9%)
0.2% ( -10% - 12%)
OrHighNotLow 118.25 (2.9%) 118.47 (3.5%)
0.2% ( -6% - 6%)
BrowseMonthTaxoFacets 2.94 (0.4%) 2.94 (0.8%)
0.2% ( 0% - 1%)
BrowseDateTaxoFacets 2.75 (0.3%) 2.75 (1.6%)
0.3% ( -1% - 2%)
LowSloppyPhrase 105.28 (2.3%) 105.60 (2.5%)
0.3% ( -4% - 5%)
Prefix3 122.07 (6.8%) 122.55 (6.5%)
0.4% ( -12% - 14%)
OrNotHighHigh 55.07 (3.8%) 55.29 (4.5%)
0.4% ( -7% - 8%)
BrowseMonthSSDVFacets 20.88 (7.2%) 20.99 (7.5%)
0.5% ( -13% - 16%)
OrHighNotHigh 58.40 (4.2%) 58.72 (4.8%)
0.6% ( -8% - 9%)
Wildcard 79.87 (3.7%) 80.31 (4.0%)
0.6% ( -6% - 8%)
OrHighMed 13.25 (4.3%) 13.34 (4.9%)
0.6% ( -8% - 10%)
BrowseDayOfYearTaxoFacets 2.73 (0.6%) 2.75 (1.6%)
0.7% ( -1% - 2%)
OrHighHigh 22.03 (4.1%) 22.19 (4.9%)
0.7% ( -8% - 10%)
AndHighHigh 23.46 (2.1%) 23.63 (1.9%)
0.7% ( -3% - 4%)
PKLookup 145.59 (4.2%) 146.66 (4.3%)
0.7% ( -7% - 9%)
MedTerm 171.13 (5.0%) 172.43 (5.1%)
0.8% ( -8% - 11%)
OrHighLow 119.22 (2.8%) 120.23 (3.1%)
0.8% ( -4% - 6%)
OrHighNotMed 87.06 (3.7%) 87.80 (4.1%)
0.8% ( -6% - 8%)
IntNRQ 26.44 (12.8%) 26.68 (11.5%)
0.9% ( -20% - 28%)
HighTerm 107.64 (6.1%) 108.88 (5.6%)
1.2% ( -9% - 13%)
Fuzzy2 69.69 (10.7%) 71.64 (7.4%)
2.8% ( -13% - 23%)
Fuzzy1 53.95 (6.5%) 55.79 (6.2%)
3.4% ( -8% - 17%)
HighTermDayOfYearSort 19.71 (4.7%) 21.51 (7.1%)
9.1% ( -2% - 21%){noformat}
Unfortunately, luceneutil shows regression on non index sort match sorting
(HighTermMonthSort). I can't reproduce the regression on any real case, but I'm
afraid my benchmarks isn't quite accurate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]