[
https://issues.apache.org/jira/browse/LUCENE-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7993:
---------------------------------
Attachment: LUCENE-7993.patch
Here is a patch that applies on top of LUCENE-4100 to show the idea. Luceneutil
confirms it brings interesting gains on wikimedium10m:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrHighNotLow 88.30 (4.4%) 72.67 (2.4%)
-17.7% ( -23% - -11%)
OrHighNotMed 93.18 (3.3%) 86.58 (1.9%)
-7.1% ( -11% - -1%)
OrNotHighLow 1386.80 (4.0%) 1289.38 (3.3%)
-7.0% ( -13% - 0%)
OrHighNotHigh 49.84 (3.2%) 47.59 (1.7%)
-4.5% ( -9% - 0%)
Fuzzy2 196.79 (16.6%) 188.44 (7.7%)
-4.2% ( -24% - 24%)
HighSpanNear 58.01 (2.2%) 56.18 (2.4%)
-3.2% ( -7% - 1%)
OrNotHighMed 184.60 (1.7%) 178.77 (2.4%)
-3.2% ( -7% - 0%)
AndHighMed 224.60 (1.9%) 217.95 (2.3%)
-3.0% ( -7% - 1%)
LowSpanNear 143.79 (2.4%) 139.98 (2.4%)
-2.7% ( -7% - 2%)
IntNRQ 19.47 (4.2%) 19.13 (5.0%)
-1.8% ( -10% - 7%)
MedTerm 248.95 (2.3%) 244.80 (1.9%)
-1.7% ( -5% - 2%)
LowTerm 766.37 (3.6%) 758.11 (3.9%)
-1.1% ( -8% - 6%)
HighTerm 131.14 (2.5%) 129.74 (2.6%)
-1.1% ( -5% - 4%)
AndHighHigh 30.70 (2.4%) 30.40 (1.5%)
-1.0% ( -4% - 3%)
OrNotHighHigh 55.99 (2.7%) 55.50 (1.7%)
-0.9% ( -5% - 3%)
Prefix3 105.33 (4.8%) 104.60 (3.6%)
-0.7% ( -8% - 8%)
MedSpanNear 13.38 (2.3%) 13.30 (2.1%)
-0.6% ( -4% - 3%)
Wildcard 84.93 (4.8%) 84.59 (3.7%)
-0.4% ( -8% - 8%)
AndHighLow 1419.89 (3.3%) 1432.43 (2.8%)
0.9% ( -4% - 7%)
LowSloppyPhrase 38.50 (3.0%) 39.02 (1.7%)
1.3% ( -3% - 6%)
HighSloppyPhrase 15.85 (4.2%) 16.10 (2.4%)
1.6% ( -4% - 8%)
MedSloppyPhrase 118.20 (3.8%) 120.36 (2.4%)
1.8% ( -4% - 8%)
Respell 272.44 (6.5%) 279.22 (3.5%)
2.5% ( -7% - 13%)
HighTermMonthSort 226.59 (9.1%) 233.94 (9.1%)
3.2% ( -13% - 23%)
Fuzzy1 163.36 (10.6%) 171.95 (8.7%)
5.3% ( -12% - 27%)
LowPhrase 195.93 (2.2%) 222.77 (2.2%)
13.7% ( 9% - 18%)
OrHighHigh 34.58 (6.4%) 45.87 (6.8%)
32.6% ( 18% - 49%)
HighTermDayOfYearSort 65.42 (6.6%) 87.68 (12.5%)
34.0% ( 14% - 56%)
MedPhrase 40.05 (2.0%) 59.16 (2.3%)
47.7% ( 42% - 53%)
OrHighMed 41.35 (6.0%) 64.85 (7.3%)
56.8% ( 41% - 74%)
HighPhrase 22.51 (3.8%) 39.33 (4.0%)
74.8% ( 64% - 85%)
OrHighLow 61.15 (3.2%) 629.98 (41.3%)
930.3% ( 858% - 1007%)
{noformat}
Changes to the performance of disjunctions are thanks to MAXSCORE, however we
can see that {{LowPhrase}} (+13.7%), {{MedPhrase}} (+47.7%) and {{HighPhrase}}
(+74.8%) have good speedups too.
> Speed up phrase queries when total hit count is not needed
> ----------------------------------------------------------
>
> Key: LUCENE-7993
> URL: https://issues.apache.org/jira/browse/LUCENE-7993
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7993.patch
>
>
> Follow-up of LUCENE-4100: When thinking about the API that we needed to
> introduce to support MAXSCORE, I wondered whether the same API could support
> other optimizations. The idea is that when running phrase queries, before we
> start reading positions, we already have access to the term frequency of each
> term. And the frequency of the phrase is bounded by the minimum term
> frequency of the involved terms. So if the score for that minimum term
> frequency is not competitive then it means that the score for the phrase is
> not competitive either if we can assume that the score increases (or
> stagnates) when the term freq increases, which sounds like an ok requirement
> for a sane Similarity?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]