[
https://issues.apache.org/jira/browse/LUCENE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6244:
---------------------------------
Attachment: wikibig.tasks
LUCENE-6244.patch
I agree it would be important that our benchmarks track the performance of BS2
as this scorer is probably used pretty often!
I worked a bit more on the patch in order to get back some performance. Because
things are structured differently, I lost the feature that we confirm at most
one clause per doc, but at least performance on simple queries is back (with
BS1 disabled this time):
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Respell 70.77 (3.2%) 69.96 (5.6%)
-1.1% ( -9% - 7%)
Fuzzy2 57.49 (7.8%) 57.03 (10.1%)
-0.8% ( -17% - 18%)
AndHighHigh 90.13 (1.7%) 89.61 (2.1%)
-0.6% ( -4% - 3%)
IntNRQ 7.32 (5.2%) 7.28 (5.3%)
-0.5% ( -10% - 10%)
OrNotHighLow 824.56 (3.5%) 821.47 (4.0%)
-0.4% ( -7% - 7%)
HighTerm 73.82 (1.3%) 73.57 (1.1%)
-0.3% ( -2% - 2%)
LowPhrase 74.18 (1.9%) 73.96 (1.9%)
-0.3% ( -4% - 3%)
HighSpanNear 43.58 (3.4%) 43.49 (3.7%)
-0.2% ( -7% - 7%)
Prefix3 72.06 (3.9%) 71.91 (3.8%)
-0.2% ( -7% - 7%)
PKLookup 265.53 (3.1%) 265.02 (2.8%)
-0.2% ( -5% - 5%)
HighPhrase 4.24 (4.2%) 4.23 (4.4%)
-0.1% ( -8% - 8%)
OrHighNotHigh 35.52 (1.5%) 35.51 (1.6%)
-0.0% ( -3% - 3%)
HighSloppyPhrase 27.77 (2.4%) 27.77 (2.8%)
-0.0% ( -5% - 5%)
LowSpanNear 24.53 (5.1%) 24.53 (5.7%)
0.0% ( -10% - 11%)
MedSloppyPhrase 51.82 (2.5%) 51.83 (2.6%)
0.0% ( -5% - 5%)
OrNotHighHigh 36.18 (1.0%) 36.20 (1.2%)
0.1% ( -2% - 2%)
LowSloppyPhrase 96.11 (2.6%) 96.18 (2.8%)
0.1% ( -5% - 5%)
MedPhrase 134.06 (2.0%) 134.18 (2.5%)
0.1% ( -4% - 4%)
Fuzzy1 64.22 (8.2%) 64.29 (6.3%)
0.1% ( -13% - 15%)
AndHighMed 206.17 (1.8%) 206.47 (2.5%)
0.1% ( -4% - 4%)
Wildcard 27.28 (2.3%) 27.32 (2.9%)
0.2% ( -4% - 5%)
MedSpanNear 36.58 (3.6%) 36.64 (4.1%)
0.2% ( -7% - 8%)
AndHighLow 882.47 (3.8%) 884.53 (4.4%)
0.2% ( -7% - 8%)
MedTerm 297.22 (1.1%) 297.91 (1.4%)
0.2% ( -2% - 2%)
OrHighNotLow 80.63 (2.3%) 80.85 (2.5%)
0.3% ( -4% - 5%)
OrHighNotMed 97.77 (2.3%) 98.11 (2.2%)
0.3% ( -4% - 4%)
OrNotHighMed 189.36 (1.8%) 190.11 (1.8%)
0.4% ( -3% - 4%)
LowTerm 820.55 (2.9%) 830.32 (2.5%)
1.2% ( -4% - 6%)
OrHighHigh 26.44 (4.5%) 27.58 (3.5%)
4.3% ( -3% - 12%)
OrHighMed 59.16 (4.4%) 62.87 (4.2%)
6.3% ( -2% - 15%)
OrHighLow 8.45 (4.5%) 9.10 (4.4%)
7.7% ( -1% - 17%)
{noformat}
I also wanted to test the overhead of propagating approximations to other
scorers such as conjunctions, so I modified the tasks from LUCENE-6198 to make
them look like {{+("phrase" term1) +term2}} (see attached file), here are the
results, I think they are encouraging.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
AndMedPhraseHighTerm 17.10 (2.3%) 15.47 (1.7%)
-9.5% ( -13% - -5%)
AndHighPhraseHighTerm 9.04 (2.0%) 8.95 (1.2%)
-1.0% ( -4% - 2%)
AndMedPhraseLowTerm 129.01 (5.2%) 147.93 (9.2%)
14.7% ( 0% - 30%)
AndHighPhraseMedTerm 13.55 (2.4%) 15.90 (2.4%)
17.3% ( 12% - 22%)
AndHighPhraseLowTerm 31.49 (2.7%) 38.07 (3.8%)
20.9% ( 13% - 28%)
AndMedPhraseMedTerm 25.39 (2.6%) 37.93 (4.1%)
49.4% ( 41% - 57%)
{noformat}
I also added more evil tests to TestApproximationSearchEquivalence.
> Approximations on disjunctions
> ------------------------------
>
> Key: LUCENE-6244
> URL: https://issues.apache.org/jira/browse/LUCENE-6244
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6244.patch, LUCENE-6244.patch, wikibig.tasks
>
>
> Like we just did on exact phrases and conjunctions, we should also support
> approximations on disjunctions in order to apply "matches()" lazily.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]