[
https://issues.apache.org/jira/browse/LUCENE-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220743#comment-16220743
]
Adrien Grand commented on LUCENE-7993:
--------------------------------------
Benchmarks on wikibig this time, which is more appropriate since artificially
truncated documents defeat the purpose of this optimization. HighPrase is now
3x faster.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrHighHigh 97.15 (3.7%) 85.83 (3.6%)
-11.7% ( -18% - -4%)
Fuzzy2 142.85 (8.7%) 131.63 (11.0%)
-7.9% ( -25% - 12%)
Fuzzy1 216.22 (9.6%) 200.10 (8.1%)
-7.5% ( -22% - 11%)
MedSloppyPhrase 8.02 (7.4%) 7.78 (10.1%)
-3.0% ( -19% - 15%)
HighSloppyPhrase 31.23 (5.7%) 30.59 (7.7%)
-2.0% ( -14% - 12%)
MedSpanNear 124.68 (4.7%) 122.26 (4.7%)
-1.9% ( -10% - 7%)
LowSpanNear 34.39 (8.2%) 33.90 (8.0%)
-1.4% ( -16% - 16%)
LowSloppyPhrase 27.55 (5.1%) 27.28 (6.8%)
-1.0% ( -12% - 11%)
IntNRQ 164.57 (7.2%) 163.10 (8.5%)
-0.9% ( -15% - 16%)
HighSpanNear 48.43 (4.5%) 48.03 (4.2%)
-0.8% ( -9% - 8%)
Respell 226.20 (3.1%) 225.11 (4.7%)
-0.5% ( -8% - 7%)
AndHighLow 1211.79 (3.9%) 1211.37 (3.1%)
-0.0% ( -6% - 7%)
AndHighMed 130.59 (2.0%) 130.71 (1.8%)
0.1% ( -3% - 3%)
HighTermMonthSort 307.88 (7.8%) 308.47 (8.4%)
0.2% ( -14% - 17%)
MedTerm 361.52 (2.9%) 362.23 (2.8%)
0.2% ( -5% - 6%)
AndHighHigh 114.80 (1.9%) 115.38 (1.8%)
0.5% ( -3% - 4%)
Prefix3 248.47 (5.0%) 249.86 (5.3%)
0.6% ( -9% - 11%)
HighTerm 201.95 (2.9%) 203.53 (2.9%)
0.8% ( -4% - 6%)
Wildcard 224.17 (4.4%) 226.12 (3.9%)
0.9% ( -7% - 9%)
LowTerm 1862.62 (3.6%) 1903.87 (4.2%)
2.2% ( -5% - 10%)
OrHighMed 106.09 (4.6%) 145.10 (5.5%)
36.8% ( 25% - 49%)
LowPhrase 81.86 (5.9%) 112.43 (3.5%)
37.4% ( 26% - 49%)
HighTermDayOfYearSort 227.00 (7.3%) 312.89 (10.6%)
37.8% ( 18% - 60%)
MedPhrase 17.95 (14.2%) 43.93 (15.1%)
144.7% ( 101% - 202%)
HighPhrase 29.28 (7.5%) 87.43 (8.6%)
198.6% ( 169% - 231%)
OrHighLow 110.21 (3.9%) 835.01 (34.0%)
657.6% ( 596% - 723%)
{noformat}
> Speed up phrase queries when total hit count is not needed
> ----------------------------------------------------------
>
> Key: LUCENE-7993
> URL: https://issues.apache.org/jira/browse/LUCENE-7993
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7993.patch
>
>
> Follow-up of LUCENE-4100: When thinking about the API that we needed to
> introduce to support MAXSCORE, I wondered whether the same API could support
> other optimizations. The idea is that when running phrase queries, before we
> start reading positions, we already have access to the term frequency of each
> term. And the frequency of the phrase is bounded by the minimum term
> frequency of the involved terms. So if the score for that minimum term
> frequency is not competitive then it means that the score for the phrase is
> not competitive either if we can assume that the score increases (or
> stagnates) when the term freq increases, which sounds like an ok requirement
> for a sane Similarity?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]