[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099990#comment-14099990
]
Michael McCandless commented on LUCENE-4396:
--------------------------------------------
Thanks Da, new patch applies cleanly!
But I still see some perf hit to pure OR queries:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
OrHighMed 29.64 (7.1%) 25.78 (11.6%)
-13.0% ( -29% - 6%)
OrHighLow 23.40 (7.5%) 20.74 (9.4%)
-11.4% ( -26% - 5%)
OrHighHigh 35.16 (7.2%) 31.26 (9.3%)
-11.1% ( -25% - 5%)
OrHighNotLow 52.50 (7.2%) 47.38 (9.4%)
-9.7% ( -24% - 7%)
OrHighNotMed 27.99 (7.2%) 26.08 (9.3%)
-6.8% ( -21% - 10%)
Fuzzy2 38.75 (8.7%) 36.26 (7.6%)
-6.4% ( -20% - 10%)
OrHighNotHigh 7.44 (6.7%) 7.01 (8.7%)
-5.8% ( -19% - 10%)
Fuzzy1 69.64 (4.9%) 67.24 (5.3%)
-3.4% ( -12% - 7%)
LowTerm 357.08 (2.9%) 347.21 (5.9%)
-2.8% ( -11% - 6%)
HighSpanNear 3.11 (2.4%) 3.04 (9.0%)
-2.3% ( -13% - 9%)
Respell 40.87 (4.4%) 39.96 (6.2%)
-2.2% ( -12% - 8%)
LowPhrase 15.36 (2.2%) 15.02 (8.5%)
-2.2% ( -12% - 8%)
MedTerm 154.50 (2.8%) 153.13 (1.5%)
-0.9% ( -5% - 3%)
AndHighLow 536.51 (6.3%) 532.65 (10.5%)
-0.7% ( -16% - 17%)
HighTerm 24.23 (3.4%) 24.06 (1.7%)
-0.7% ( -5% - 4%)
AndHighHigh 25.25 (1.0%) 25.15 (1.0%)
-0.4% ( -2% - 1%)
MedSpanNear 7.36 (3.9%) 7.33 (4.1%)
-0.4% ( -8% - 7%)
MedPhrase 38.32 (1.9%) 38.17 (2.2%)
-0.4% ( -4% - 3%)
HighPhrase 42.07 (2.3%) 41.98 (2.4%)
-0.2% ( -4% - 4%)
AndHighMed 66.56 (1.7%) 66.44 (1.8%)
-0.2% ( -3% - 3%)
IntNRQ 3.09 (2.1%) 3.09 (2.4%)
-0.1% ( -4% - 4%)
Wildcard 28.08 (2.2%) 28.06 (1.8%)
-0.1% ( -4% - 4%)
LowSpanNear 14.61 (4.4%) 14.61 (4.6%)
0.0% ( -8% - 9%)
LowSloppyPhrase 14.17 (2.0%) 14.19 (1.7%)
0.1% ( -3% - 3%)
HighSloppyPhrase 10.47 (3.0%) 10.49 (3.0%)
0.2% ( -5% - 6%)
Prefix3 53.87 (4.3%) 54.21 (3.2%)
0.6% ( -6% - 8%)
MedSloppyPhrase 45.63 (9.0%) 46.57 (4.6%)
2.1% ( -10% - 17%)
OrNotHighHigh 15.44 (6.3%) 16.02 (6.9%)
3.8% ( -8% - 18%)
OrNotHighLow 25.75 (6.1%) 29.99 (10.2%)
16.5% ( 0% - 34%)
OrNotHighMed 8.73 (8.9%) 10.20 (5.1%)
16.9% ( 2% - 33%)
{noformat}
Unfortunately, I've found hotspot to be very finicky about changes to
BooleanScorer in the past; I'm not sure why. Maybe, we have to not touch
BooleanScorer here (meaning it can't accept MUST clauses) and just direct all
appropriate (according to the switching criteria) queries with MUST clauses to
BAS?
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, LUCENE-4396-simple.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> SIZE.perf, all.perf, luceneutil-score-equal.patch,
> luceneutil-score-equal.patch, merge-simple.perf, merge-simple.png,
> merge.perf, merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]