jpountz opened a new pull request, #14970: URL: https://github.com/apache/lucene/pull/14970
PRs #14906 and #14896 improved the efficiency of filtering by score. This PR tries to get some extra speedup by: - Skipping filtering by score when applying a non-essential clause that doesn't have matches over the range of doc IDs being scored. - Filtering on float[] scores rather than double[] scores whenever applicable so that vectorization can work on 2x more lanes at once. - Filtering by score using `VectorUtil#filterByScore` instead of relying on the collector to do it. Luceneutil on wikibigall reports the following: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrMany 23.21 (2.1%) 22.81 (2.9%) -1.7% ( -6% - 3%) 0.033 OrHighRare 285.51 (8.2%) 281.68 (6.2%) -1.3% ( -14% - 14%) 0.559 FilteredPrefix3 148.66 (1.6%) 146.92 (2.7%) -1.2% ( -5% - 3%) 0.092 TermTitleSort 82.88 (4.9%) 82.17 (5.0%) -0.9% ( -10% - 9%) 0.585 TermDayOfYearSort 278.76 (1.9%) 276.94 (2.7%) -0.7% ( -5% - 4%) 0.380 FilteredOrMany 16.33 (2.2%) 16.25 (1.7%) -0.5% ( -4% - 3%) 0.405 FilteredOr3Terms 164.07 (1.2%) 163.54 (0.9%) -0.3% ( -2% - 1%) 0.314 FilteredOr2Terms2StopWords 144.33 (0.9%) 144.00 (0.9%) -0.2% ( -2% - 1%) 0.431 FilteredOrStopWords 44.89 (2.2%) 44.86 (1.6%) -0.1% ( -3% - 3%) 0.915 FilteredOrHighMed 150.37 (1.0%) 150.29 (1.0%) -0.1% ( -2% - 1%) 0.858 TermDTSort 376.36 (2.7%) 376.45 (3.6%) 0.0% ( -6% - 6%) 0.982 FilteredTerm 159.30 (3.0%) 159.40 (2.5%) 0.1% ( -5% - 5%) 0.943 FilteredOrHighHigh 66.16 (1.8%) 66.20 (1.6%) 0.1% ( -3% - 3%) 0.903 FilteredAnd2Terms2StopWords 209.89 (1.3%) 210.13 (1.4%) 0.1% ( -2% - 2%) 0.792 FilteredAnd3Terms 186.68 (1.1%) 187.01 (1.0%) 0.2% ( -1% - 2%) 0.602 CountFilteredOrHighMed 146.93 (0.7%) 147.23 (1.0%) 0.2% ( -1% - 1%) 0.464 FilteredAndHighHigh 77.00 (1.4%) 77.18 (3.1%) 0.2% ( -4% - 4%) 0.754 FilteredIntNRQ 286.87 (1.2%) 287.61 (1.4%) 0.3% ( -2% - 2%) 0.539 FilteredAndStopWords 63.47 (1.5%) 63.69 (3.3%) 0.3% ( -4% - 5%) 0.678 FilteredAndHighMed 152.84 (1.4%) 153.41 (1.6%) 0.4% ( -2% - 3%) 0.441 CountAndHighMed 303.73 (1.3%) 305.00 (1.6%) 0.4% ( -2% - 3%) 0.376 CountFilteredOrHighHigh 135.14 (0.9%) 135.71 (1.0%) 0.4% ( -1% - 2%) 0.159 FilteredPhrase 31.18 (1.5%) 31.32 (2.1%) 0.4% ( -3% - 4%) 0.445 CountOrMany 28.24 (2.1%) 28.37 (2.2%) 0.5% ( -3% - 4%) 0.503 CountFilteredPhrase 24.66 (1.7%) 24.79 (2.9%) 0.5% ( -3% - 5%) 0.478 CountAndHighHigh 350.62 (2.1%) 352.70 (2.7%) 0.6% ( -4% - 5%) 0.436 CountOrHighMed 352.79 (1.6%) 355.12 (2.1%) 0.7% ( -3% - 4%) 0.267 CountFilteredOrMany 26.46 (1.8%) 26.64 (1.9%) 0.7% ( -2% - 4%) 0.249 TermMonthSort 3167.86 (2.4%) 3189.35 (2.7%) 0.7% ( -4% - 5%) 0.395 Or2Terms2StopWords 201.14 (1.5%) 202.57 (1.4%) 0.7% ( -2% - 3%) 0.131 CombinedOrHighMed 86.87 (0.8%) 87.50 (2.7%) 0.7% ( -2% - 4%) 0.253 CountTerm 8625.60 (2.6%) 8688.85 (2.8%) 0.7% ( -4% - 6%) 0.392 CombinedOrHighHigh 22.80 (1.0%) 22.99 (3.7%) 0.8% ( -3% - 5%) 0.328 Or3Terms 226.48 (1.6%) 228.60 (1.8%) 0.9% ( -2% - 4%) 0.089 And2Terms2StopWords 200.20 (1.7%) 202.22 (1.3%) 1.0% ( -1% - 4%) 0.036 CountOrHighHigh 333.72 (2.3%) 337.23 (2.4%) 1.1% ( -3% - 5%) 0.156 CombinedAndHighHigh 23.04 (0.8%) 23.30 (1.3%) 1.1% ( -1% - 3%) 0.001 And3Terms 235.22 (1.8%) 237.94 (2.0%) 1.2% ( -2% - 5%) 0.055 CombinedAndHighMed 88.23 (0.7%) 89.37 (1.2%) 1.3% ( 0% - 3%) 0.000 CountPhrase 4.09 (3.0%) 4.15 (1.8%) 1.5% ( -3% - 6%) 0.066 AndMedOrHighHigh 86.57 (1.3%) 87.90 (1.9%) 1.5% ( -1% - 4%) 0.003 AndHighOrMedMed 50.13 (2.0%) 51.26 (2.1%) 2.3% ( -1% - 6%) 0.000 OrStopWords 47.70 (2.6%) 48.98 (2.6%) 2.7% ( -2% - 8%) 0.001 AndStopWords 45.93 (2.9%) 47.37 (2.4%) 3.1% ( -2% - 8%) 0.000 CombinedTerm 38.49 (3.9%) 39.80 (1.2%) 3.4% ( -1% - 8%) 0.000 OrHighMed 251.67 (1.9%) 260.48 (2.3%) 3.5% ( 0% - 7%) 0.000 Term 652.30 (5.9%) 675.34 (5.0%) 3.5% ( -6% - 15%) 0.041 AndHighMed 198.48 (2.1%) 205.50 (2.1%) 3.5% ( 0% - 7%) 0.000 AndHighHigh 67.67 (2.8%) 70.74 (2.6%) 4.5% ( 0% - 10%) 0.000 OrHighHigh 76.43 (2.4%) 80.03 (2.4%) 4.7% ( 0% - 9%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org