[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Da Huang updated LUCENE-4396:
-----------------------------
Attachment: LUCENE-4396.patch
luceneutil-score-equal.patch
AndOr.tasks
The patch is based on lucene github mirror commit
cfb408ff6788e6fea8215098a785d72fb4e95c5b.
The following things have been done:
1. Rename TestBooleanNovelScorer to TestBooleanUnevenly, and this test suit
test both BNS and BS when hit documents' distribution is unevenly.
2. Following Robert's advice, I sum scores into a double and cast to float in
ConjunctionScorer. However, it seems to take little effect. Scores difference
problem still remain.
3. Add a comment to scores difference within tolerance on luceneutil.
4. Make a new tasks file, which can test "AndSomeOR" cases.
5. Run luceneutil for "BNS vs BS2" and "BS vs BS2". The result is showed as
follows.
{code}
BNS vs BS2
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
HighAndTonsLowOr 10.95 (3.5%) 1.52 (0.3%)
-86.1% ( -86% - -85%)
HighAndSomeLowOr 29.98 (6.7%) 11.84 (2.9%)
-60.5% ( -65% - -54%)
LowAndSomeLowOr 756.81 (1.4%) 503.21 (2.8%)
-33.5% ( -37% - -29%)
LowAndSomeHighOr 54.25 (2.1%) 53.26 (2.1%)
-1.8% ( -5% - 2%)
PKLookup 241.74 (2.8%) 241.96 (2.3%)
0.1% ( -4% - 5%)
LowAndTonsLowOr 40.23 (1.2%) 43.19 (7.2%)
7.4% ( 0% - 15%)
LowAndTonsHighOr 2.63 (2.1%) 2.99 (2.3%)
13.8% ( 9% - 18%)
HighAndSomeHighOr 4.99 (1.8%) 5.86 (4.7%)
17.4% ( 10% - 24%)
HighAndTonsHighOr 0.09 (1.5%) 0.22 (8.1%)
145.4% ( 133% - 157%)
BS vs BS2
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
HighAndTonsLowOr 16.54 (2.4%) 3.70 (0.2%)
-77.6% ( -78% - -76%)
HighAndSomeLowOr 11.95 (8.5%) 4.29 (0.8%)
-64.1% ( -67% - -59%)
LowAndSomeLowOr 839.11 (1.9%) 540.83 (2.5%)
-35.5% ( -39% - -31%)
LowAndSomeHighOr 149.50 (2.6%) 136.71 (3.4%)
-8.6% ( -14% - -2%)
HighAndSomeHighOr 3.72 (1.7%) 3.51 (1.7%)
-5.6% ( -8% - -2%)
PKLookup 240.32 (2.8%) 238.87 (2.8%)
-0.6% ( -6% - 5%)
LowAndTonsHighOr 4.96 (2.3%) 5.35 (3.8%)
7.8% ( 1% - 14%)
LowAndTonsLowOr 35.28 (1.2%) 39.00 (5.2%)
10.6% ( 4% - 17%)
HighAndTonsHighOr 0.16 (1.1%) 0.36 (4.0%)
122.6% ( 116% - 129%)
{code}
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: AndOr.tasks, AndOr.tasks, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]