[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100255#comment-14100255
]
Da Huang commented on LUCENE-4396:
----------------------------------
I've tested again with the setup exactly same as mike's.
Here's the performance.
{code}
TaskQPS baseline StdDevQPS my_version StdDev
Pct diff
HighSpanNear 1.05 (2.1%) 1.04 (2.1%)
-1.6% ( -5% - 2%)
HighSloppyPhrase 3.83 (5.3%) 3.78 (4.9%)
-1.3% ( -10% - 9%)
LowTerm 78.04 (4.5%) 77.13 (4.5%)
-1.2% ( -9% - 8%)
MedSpanNear 2.89 (3.9%) 2.86 (3.3%)
-1.1% ( -8% - 6%)
LowSpanNear 5.91 (4.9%) 5.84 (4.2%)
-1.1% ( -9% - 8%)
HighTerm 8.02 (12.1%) 7.94 (11.4%)
-1.0% ( -21% - 25%)
AndHighHigh 9.84 (1.9%) 9.74 (2.4%)
-1.0% ( -5% - 3%)
MedTerm 30.63 (4.7%) 30.35 (5.1%)
-0.9% ( -10% - 9%)
LowSloppyPhrase 5.83 (4.4%) 5.79 (4.5%)
-0.7% ( -9% - 8%)
MedSloppyPhrase 16.86 (4.5%) 16.75 (4.3%)
-0.6% ( -9% - 8%)
OrHighMed 7.57 (4.5%) 7.55 (4.1%)
-0.3% ( -8% - 8%)
OrNotHighLow 7.87 (5.3%) 7.84 (5.3%)
-0.3% ( -10% - 10%)
AndHighMed 25.10 (3.1%) 25.05 (3.7%)
-0.2% ( -6% - 6%)
Fuzzy2 10.80 (2.7%) 10.78 (2.9%)
-0.1% ( -5% - 5%)
OrHighHigh 8.75 (4.4%) 8.74 (4.1%)
-0.1% ( -8% - 8%)
OrHighNotMed 7.33 (4.4%) 7.33 (4.0%)
-0.1% ( -8% - 8%)
OrNotHighHigh 4.84 (5.1%) 4.84 (5.0%)
-0.1% ( -9% - 10%)
OrHighLow 6.67 (4.6%) 6.66 (4.5%)
-0.1% ( -8% - 9%)
OrNotHighMed 2.90 (5.2%) 2.89 (5.2%)
-0.1% ( -10% - 10%)
OrHighNotHigh 2.32 (4.9%) 2.32 (4.6%)
-0.0% ( -9% - 9%)
Fuzzy1 20.35 (3.1%) 20.38 (3.4%)
0.1% ( -6% - 6%)
OrHighNotLow 13.54 (4.5%) 13.56 (4.2%)
0.2% ( -8% - 9%)
MedPhrase 11.75 (3.2%) 11.78 (2.4%)
0.2% ( -5% - 5%)
LowPhrase 6.08 (2.9%) 6.09 (2.7%)
0.2% ( -5% - 6%)
HighPhrase 13.25 (3.8%) 13.29 (3.4%)
0.3% ( -6% - 7%)
Prefix3 19.78 (3.2%) 19.85 (3.9%)
0.4% ( -6% - 7%)
Respell 15.13 (3.1%) 15.19 (3.7%)
0.4% ( -6% - 7%)
Wildcard 8.82 (3.3%) 8.89 (4.9%)
0.8% ( -7% - 9%)
IntNRQ 0.85 (4.2%) 0.86 (6.0%)
1.3% ( -8% - 12%)
AndHighLow 172.85 (4.9%) 175.57 (4.7%)
1.6% ( -7% - 11%)
{code}
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, LUCENE-4396-simple.patch,
> LUCENE-4396-simple.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf,
> luceneutil-score-equal.patch, luceneutil-score-equal.patch,
> merge-simple.perf, merge-simple.png, merge.perf, merge.png, perf.png,
> stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]