[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096924#comment-14096924
]
Michael McCandless commented on LUCENE-4396:
--------------------------------------------
I ran the default luceneutil tasks, on full Wikipedia (en) index, with
Java 1.8.0_05. I use these JVM options to reduce hotspot noise:
{noformat}
-server -XX:-UseTieredCompilation -Xbatch
{noformat}
And run with 4 GB heap:
{noformat}
Report after iter 19:
Task QPS base StdDev QPS comp StdDev
Pct diff
OrHighLow 22.04 (13.0%) 20.95 (7.8%)
-5.0% ( -22% - 18%)
OrHighHigh 32.92 (12.5%) 31.63 (7.8%)
-3.9% ( -21% - 18%)
OrHighNotMed 27.17 (7.3%) 26.41 (7.7%)
-2.8% ( -16% - 13%)
OrHighMed 27.43 (14.1%) 26.72 (7.8%)
-2.6% ( -21% - 22%)
OrHighNotHigh 7.14 (9.5%) 7.02 (8.1%)
-1.7% ( -17% - 17%)
AndHighHigh 24.87 (3.9%) 24.67 (4.7%)
-0.8% ( -9% - 8%)
MedPhrase 37.06 (7.8%) 36.78 (11.1%)
-0.8% ( -18% - 19%)
OrHighNotLow 48.28 (13.2%) 48.21 (8.1%)
-0.1% ( -18% - 24%)
HighSloppyPhrase 10.50 (3.2%) 10.48 (3.9%)
-0.1% ( -7% - 7%)
MedSloppyPhrase 46.47 (4.3%) 46.49 (4.0%)
0.0% ( -7% - 8%)
MedSpanNear 7.41 (5.2%) 7.43 (5.4%)
0.2% ( -9% - 11%)
LowTerm 347.65 (1.7%) 351.48 (1.7%)
1.1% ( -2% - 4%)
Fuzzy1 65.17 (13.9%) 65.96 (9.0%)
1.2% ( -18% - 27%)
IntNRQ 3.02 (7.9%) 3.06 (2.7%)
1.3% ( -8% - 12%)
HighPhrase 41.20 (5.4%) 41.82 (2.5%)
1.5% ( -6% - 9%)
Fuzzy2 35.60 (11.8%) 36.23 (8.5%)
1.8% ( -16% - 25%)
PKLookup 202.07 (3.0%) 205.78 (3.6%)
1.8% ( -4% - 8%)
AndHighMed 64.22 (8.5%) 65.54 (1.6%)
2.1% ( -7% - 13%)
Wildcard 27.13 (11.1%) 27.75 (5.9%)
2.3% ( -13% - 21%)
Respell 38.65 (10.3%) 39.58 (9.1%)
2.4% ( -15% - 24%)
LowPhrase 14.73 (10.1%) 15.09 (6.6%)
2.5% ( -12% - 21%)
HighSpanNear 3.02 (9.4%) 3.10 (3.9%)
2.9% ( -9% - 17%)
Prefix3 51.57 (11.0%) 53.19 (4.9%)
3.1% ( -11% - 21%)
MedTerm 148.60 (5.1%) 153.51 (2.1%)
3.3% ( -3% - 11%)
LowSloppyPhrase 13.82 (7.7%) 14.29 (1.6%)
3.4% ( -5% - 13%)
LowSpanNear 14.27 (8.9%) 14.78 (5.8%)
3.6% ( -10% - 20%)
HighTerm 23.15 (9.1%) 24.05 (3.8%)
3.9% ( -8% - 18%)
AndHighLow 466.12 (16.4%) 501.68 (13.8%)
7.6% ( -19% - 45%)
OrNotHighHigh 14.83 (9.1%) 16.04 (5.8%)
8.1% ( -6% - 25%)
OrNotHighMed 8.47 (9.7%) 10.20 (4.8%)
20.4% ( 5% - 38%)
OrNotHighLow 24.83 (9.5%) 30.66 (4.6%)
23.5% ( 8% - 41%)
{noformat}
Looks like OrNot* got faster, but other Or* are maybe a bit slower...
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> SIZE.perf, all.perf, luceneutil-score-equal.patch,
> luceneutil-score-equal.patch, merge.perf, merge.png, perf.png, stat.cpp,
> stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]