[
https://issues.apache.org/jira/browse/LUCENE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297093#comment-14297093
]
Adrien Grand commented on LUCENE-6201:
--------------------------------------
Hmm, actually I played with it and the results are not good. I modified the
patch to make pure conjunctions handled as disjunctions with
minShouldMatch=number_of_optional_clauses. It is using the same heuristics as
in the current patch (cost > maxDoc / 3) so only AndHighHigh runs with
BooleanScorer:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
AndHighHigh 85.24 (1.8%) 57.61 (8.0%)
-32.4% ( -41% - -23%)
Fuzzy2 65.07 (12.3%) 59.37 (12.7%)
-8.8% ( -30% - 18%)
Fuzzy1 41.38 (3.4%) 39.09 (5.1%)
-5.5% ( -13% - 3%)
AndHighLow 781.86 (3.7%) 753.32 (3.7%)
-3.6% ( -10% - 3%)
HighSloppyPhrase 36.01 (2.9%) 35.20 (3.5%)
-2.2% ( -8% - 4%)
OrHighLow 92.94 (4.4%) 91.18 (5.8%)
-1.9% ( -11% - 8%)
MedSloppyPhrase 64.84 (3.0%) 63.67 (3.3%)
-1.8% ( -7% - 4%)
AndHighMed 197.21 (2.0%) 194.06 (3.3%)
-1.6% ( -6% - 3%)
LowSloppyPhrase 97.46 (2.0%) 96.00 (2.2%)
-1.5% ( -5% - 2%)
OrHighMed 41.52 (4.8%) 41.15 (6.6%)
-0.9% ( -11% - 11%)
HighPhrase 35.18 (7.3%) 34.97 (7.7%)
-0.6% ( -14% - 15%)
OrHighHigh 25.83 (5.3%) 25.69 (6.9%)
-0.6% ( -12% - 12%)
MedPhrase 31.25 (3.4%) 31.17 (3.8%)
-0.3% ( -7% - 7%)
OrHighNotHigh 49.00 (1.9%) 48.87 (1.7%)
-0.3% ( -3% - 3%)
MedTerm 246.68 (2.2%) 246.06 (2.2%)
-0.3% ( -4% - 4%)
OrHighNotMed 40.13 (2.9%) 40.05 (2.8%)
-0.2% ( -5% - 5%)
HighTerm 95.31 (2.3%) 95.20 (2.2%)
-0.1% ( -4% - 4%)
IntNRQ 7.70 (4.7%) 7.69 (4.9%)
-0.1% ( -9% - 9%)
OrNotHighHigh 29.34 (1.4%) 29.32 (2.2%)
-0.1% ( -3% - 3%)
Wildcard 59.61 (2.9%) 59.60 (3.1%)
-0.0% ( -5% - 6%)
LowSpanNear 115.48 (2.6%) 115.51 (2.4%)
0.0% ( -4% - 5%)
LowPhrase 368.22 (3.1%) 368.37 (3.4%)
0.0% ( -6% - 6%)
Prefix3 97.37 (3.7%) 97.52 (3.9%)
0.2% ( -7% - 8%)
LowTerm 868.88 (4.8%) 870.53 (6.6%)
0.2% ( -10% - 12%)
PKLookup 269.02 (3.2%) 269.55 (3.4%)
0.2% ( -6% - 7%)
OrNotHighMed 113.70 (2.0%) 114.02 (2.5%)
0.3% ( -4% - 4%)
MedSpanNear 109.45 (1.8%) 109.80 (1.8%)
0.3% ( -3% - 3%)
OrHighNotLow 103.81 (3.1%) 104.17 (2.6%)
0.3% ( -5% - 6%)
HighSpanNear 10.89 (2.1%) 10.94 (1.9%)
0.4% ( -3% - 4%)
Respell 86.63 (6.1%) 87.09 (6.1%)
0.5% ( -10% - 13%)
OrNotHighLow 967.00 (4.1%) 1019.09 (3.3%)
5.4% ( -1% - 13%)
{noformat}
BooleanScorer is not ready to run pure conjunctions. Actually this probably
makes sense: one of the reason why disjunctions (with or without
minShouldMatch) are slow is that they need to update some priority queues all
the time in order to figure out the next candidate. BooleanScorer speeds things
up by only balancing priority queues once per window. On the other hand, the
ConjunctionScorer doesn't have such issues, so BooleanScorer mostly adds
overhead by first collecting into a bit set and then recollecting from the bit
set into the actual collector.
> MinShouldMatchSumScorer should advance less and score lazily
> ------------------------------------------------------------
>
> Key: LUCENE-6201
> URL: https://issues.apache.org/jira/browse/LUCENE-6201
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6201.patch, LUCENE-6201.patch
>
>
> MinShouldMatchSumScorer currently computes the score eagerly, even on
> documents that do not eventually match if it cannot find {{minShouldMatch}}
> matches on the same document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]