[
https://issues.apache.org/jira/browse/LUCENE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7421:
---------------------------------
Attachment: LUCENE-7421.patch
Here is a patch. I tested it on wikimedium10m by disabling BS1:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
MedSloppyPhrase 31.52 (5.8%) 31.38 (6.1%)
-0.4% ( -11% - 12%)
LowPhrase 79.45 (3.7%) 79.12 (5.3%)
-0.4% ( -9% - 8%)
OrNotHighMed 160.15 (3.4%) 159.62 (3.0%)
-0.3% ( -6% - 6%)
LowSloppyPhrase 18.74 (6.8%) 18.70 (6.7%)
-0.2% ( -12% - 14%)
AndHighLow 571.10 (5.8%) 570.18 (6.4%)
-0.2% ( -11% - 12%)
HighTerm 93.87 (7.1%) 93.83 (6.1%)
-0.1% ( -12% - 14%)
LowSpanNear 191.42 (4.2%) 191.59 (4.2%)
0.1% ( -8% - 8%)
HighSpanNear 2.69 (4.8%) 2.70 (5.4%)
0.1% ( -9% - 10%)
OrNotHighLow 766.17 (7.5%) 767.55 (5.4%)
0.2% ( -11% - 14%)
OrHighNotHigh 56.81 (4.5%) 56.93 (4.4%)
0.2% ( -8% - 9%)
Respell 63.21 (6.6%) 63.39 (5.7%)
0.3% ( -11% - 13%)
HighSloppyPhrase 2.78 (8.4%) 2.79 (8.0%)
0.4% ( -14% - 18%)
IntNRQ 11.20 (19.8%) 11.26 (19.5%)
0.5% ( -32% - 49%)
Prefix3 99.08 (8.3%) 99.59 (6.8%)
0.5% ( -13% - 17%)
MedTerm 224.98 (6.1%) 226.23 (5.4%)
0.6% ( -10% - 12%)
AndHighMed 234.21 (3.9%) 235.65 (2.9%)
0.6% ( -5% - 7%)
LowTerm 565.85 (10.8%) 570.49 (11.3%)
0.8% ( -19% - 25%)
AndHighHigh 66.68 (4.0%) 67.23 (3.2%)
0.8% ( -6% - 8%)
MedSpanNear 55.15 (5.9%) 55.67 (3.6%)
0.9% ( -8% - 11%)
OrHighNotLow 75.71 (7.8%) 76.44 (6.3%)
1.0% ( -12% - 16%)
Wildcard 15.89 (8.5%) 16.05 (6.9%)
1.0% ( -13% - 17%)
OrNotHighHigh 50.83 (5.4%) 51.38 (3.7%)
1.1% ( -7% - 10%)
MedPhrase 31.99 (6.5%) 32.41 (2.9%)
1.3% ( -7% - 11%)
HighPhrase 23.83 (5.4%) 24.18 (3.6%)
1.5% ( -7% - 11%)
Fuzzy1 39.46 (8.5%) 40.13 (7.0%)
1.7% ( -12% - 18%)
OrHighNotMed 70.05 (6.8%) 71.36 (5.6%)
1.9% ( -9% - 15%)
OrHighHigh 18.82 (6.0%) 19.57 (4.7%)
4.0% ( -6% - 15%)
Fuzzy2 49.95 (17.2%) 52.28 (17.2%)
4.7% ( -25% - 47%)
OrHighMed 18.20 (8.8%) 20.01 (6.9%)
9.9% ( -5% - 28%)
OrHighLow 47.39 (7.2%) 52.25 (6.0%)
10.2% ( -2% - 25%)
{noformat}
All 3 disjunctions got a performance boost, especially those whose clauses have
very different doc frequencies: OrHighMed and OrHighLow.
> Speed up BS2 by caching the 2nd lowest doc id in the priority queue
> -------------------------------------------------------------------
>
> Key: LUCENE-7421
> URL: https://issues.apache.org/jira/browse/LUCENE-7421
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7421.patch
>
>
> BS2 uses a priority queue in order to merge several sorted iterators into a
> new sorted iterator. We call updateTop every time that we move the 'top'
> iterator forward, which requires to check the size of the priority queue at
> least twice and perform at least two comparisons of doc ids.
> Instead, DisjunctionSumScorer could cache the 2nd lowest doc id of the
> priority queue and only call updateTop when the doc id of the entry at the
> top of the priority queue goes beyong the 2nd lowest doc id. While this would
> involve slightly more work in the case that the PQ has two high-cardinality
> clauses whose docs are interleaved, this would help when one clause has a
> much higher cardinality than the other ones or when the doc ids of the
> various clauses are clustered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]