jpountz opened a new pull request, #14970:
URL: https://github.com/apache/lucene/pull/14970
PRs #14906 and #14896 improved the efficiency of filtering by score. This PR
tries to get some extra speedup by:
- Skipping filtering by score when applying a non-essential clause that
doesn't have matches over the range of doc IDs being scored.
- Filtering on float[] scores rather than double[] scores whenever
applicable so that vectorization can work on 2x more lanes at once.
- Filtering by score using `VectorUtil#filterByScore` instead of relying on
the collector to do it.
Luceneutil on wikibigall reports the following:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
OrMany 23.21 (2.1%) 22.81
(2.9%) -1.7% ( -6% - 3%) 0.033
OrHighRare 285.51 (8.2%) 281.68
(6.2%) -1.3% ( -14% - 14%) 0.559
FilteredPrefix3 148.66 (1.6%) 146.92
(2.7%) -1.2% ( -5% - 3%) 0.092
TermTitleSort 82.88 (4.9%) 82.17
(5.0%) -0.9% ( -10% - 9%) 0.585
TermDayOfYearSort 278.76 (1.9%) 276.94
(2.7%) -0.7% ( -5% - 4%) 0.380
FilteredOrMany 16.33 (2.2%) 16.25
(1.7%) -0.5% ( -4% - 3%) 0.405
FilteredOr3Terms 164.07 (1.2%) 163.54
(0.9%) -0.3% ( -2% - 1%) 0.314
FilteredOr2Terms2StopWords 144.33 (0.9%) 144.00
(0.9%) -0.2% ( -2% - 1%) 0.431
FilteredOrStopWords 44.89 (2.2%) 44.86
(1.6%) -0.1% ( -3% - 3%) 0.915
FilteredOrHighMed 150.37 (1.0%) 150.29
(1.0%) -0.1% ( -2% - 1%) 0.858
TermDTSort 376.36 (2.7%) 376.45
(3.6%) 0.0% ( -6% - 6%) 0.982
FilteredTerm 159.30 (3.0%) 159.40
(2.5%) 0.1% ( -5% - 5%) 0.943
FilteredOrHighHigh 66.16 (1.8%) 66.20
(1.6%) 0.1% ( -3% - 3%) 0.903
FilteredAnd2Terms2StopWords 209.89 (1.3%) 210.13
(1.4%) 0.1% ( -2% - 2%) 0.792
FilteredAnd3Terms 186.68 (1.1%) 187.01
(1.0%) 0.2% ( -1% - 2%) 0.602
CountFilteredOrHighMed 146.93 (0.7%) 147.23
(1.0%) 0.2% ( -1% - 1%) 0.464
FilteredAndHighHigh 77.00 (1.4%) 77.18
(3.1%) 0.2% ( -4% - 4%) 0.754
FilteredIntNRQ 286.87 (1.2%) 287.61
(1.4%) 0.3% ( -2% - 2%) 0.539
FilteredAndStopWords 63.47 (1.5%) 63.69
(3.3%) 0.3% ( -4% - 5%) 0.678
FilteredAndHighMed 152.84 (1.4%) 153.41
(1.6%) 0.4% ( -2% - 3%) 0.441
CountAndHighMed 303.73 (1.3%) 305.00
(1.6%) 0.4% ( -2% - 3%) 0.376
CountFilteredOrHighHigh 135.14 (0.9%) 135.71
(1.0%) 0.4% ( -1% - 2%) 0.159
FilteredPhrase 31.18 (1.5%) 31.32
(2.1%) 0.4% ( -3% - 4%) 0.445
CountOrMany 28.24 (2.1%) 28.37
(2.2%) 0.5% ( -3% - 4%) 0.503
CountFilteredPhrase 24.66 (1.7%) 24.79
(2.9%) 0.5% ( -3% - 5%) 0.478
CountAndHighHigh 350.62 (2.1%) 352.70
(2.7%) 0.6% ( -4% - 5%) 0.436
CountOrHighMed 352.79 (1.6%) 355.12
(2.1%) 0.7% ( -3% - 4%) 0.267
CountFilteredOrMany 26.46 (1.8%) 26.64
(1.9%) 0.7% ( -2% - 4%) 0.249
TermMonthSort 3167.86 (2.4%) 3189.35
(2.7%) 0.7% ( -4% - 5%) 0.395
Or2Terms2StopWords 201.14 (1.5%) 202.57
(1.4%) 0.7% ( -2% - 3%) 0.131
CombinedOrHighMed 86.87 (0.8%) 87.50
(2.7%) 0.7% ( -2% - 4%) 0.253
CountTerm 8625.60 (2.6%) 8688.85
(2.8%) 0.7% ( -4% - 6%) 0.392
CombinedOrHighHigh 22.80 (1.0%) 22.99
(3.7%) 0.8% ( -3% - 5%) 0.328
Or3Terms 226.48 (1.6%) 228.60
(1.8%) 0.9% ( -2% - 4%) 0.089
And2Terms2StopWords 200.20 (1.7%) 202.22
(1.3%) 1.0% ( -1% - 4%) 0.036
CountOrHighHigh 333.72 (2.3%) 337.23
(2.4%) 1.1% ( -3% - 5%) 0.156
CombinedAndHighHigh 23.04 (0.8%) 23.30
(1.3%) 1.1% ( -1% - 3%) 0.001
And3Terms 235.22 (1.8%) 237.94
(2.0%) 1.2% ( -2% - 5%) 0.055
CombinedAndHighMed 88.23 (0.7%) 89.37
(1.2%) 1.3% ( 0% - 3%) 0.000
CountPhrase 4.09 (3.0%) 4.15
(1.8%) 1.5% ( -3% - 6%) 0.066
AndMedOrHighHigh 86.57 (1.3%) 87.90
(1.9%) 1.5% ( -1% - 4%) 0.003
AndHighOrMedMed 50.13 (2.0%) 51.26
(2.1%) 2.3% ( -1% - 6%) 0.000
OrStopWords 47.70 (2.6%) 48.98
(2.6%) 2.7% ( -2% - 8%) 0.001
AndStopWords 45.93 (2.9%) 47.37
(2.4%) 3.1% ( -2% - 8%) 0.000
CombinedTerm 38.49 (3.9%) 39.80
(1.2%) 3.4% ( -1% - 8%) 0.000
OrHighMed 251.67 (1.9%) 260.48
(2.3%) 3.5% ( 0% - 7%) 0.000
Term 652.30 (5.9%) 675.34
(5.0%) 3.5% ( -6% - 15%) 0.041
AndHighMed 198.48 (2.1%) 205.50
(2.1%) 3.5% ( 0% - 7%) 0.000
AndHighHigh 67.67 (2.8%) 70.74
(2.6%) 4.5% ( 0% - 10%) 0.000
OrHighHigh 76.43 (2.4%) 80.03
(2.4%) 4.7% ( 0% - 9%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]