[
https://issues.apache.org/jira/browse/LUCENE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6201:
---------------------------------
Attachment: LUCENE-6201.patch
New patch: BooleanScorer can now also deal with minShouldMatch. The way it
works is that it scores all windows of 2048 documents where at least
minShouldMatch clauses have a match. However, there is no guarantee about the
intersection of the matches so it is only used for minShouldMatch > 1 when
matches are likely dense.
Here are results from the luceneutil benchmark:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Low4MinShouldMatch4 1349.50 (6.4%) 1064.22 (3.7%)
-21.1% ( -29% - -11%)
Low3MinShouldMatch4 1225.14 (11.9%) 977.61 (4.6%)
-20.2% ( -32% - -4%)
Low4MinShouldMatch3 1040.26 (5.4%) 859.33 (3.0%)
-17.4% ( -24% - -9%)
Low4MinShouldMatch2 316.21 (4.6%) 281.75 (2.5%)
-10.9% ( -17% - -3%)
Low2MinShouldMatch4 349.07 (7.8%) 316.85 (4.8%)
-9.2% ( -20% - 3%)
Low3MinShouldMatch3 308.45 (5.4%) 280.00 (2.2%)
-9.2% ( -15% - -1%)
Low4MinShouldMatch0 72.57 (2.9%) 74.43 (11.3%)
2.6% ( -11% - 17%)
Low2MinShouldMatch3 38.11 (10.5%) 39.30 (12.6%)
3.1% ( -18% - 29%)
Low3MinShouldMatch0 47.95 (2.4%) 49.45 (12.5%)
3.1% ( -11% - 18%)
Low1MinShouldMatch4 39.78 (9.7%) 41.05 (2.5%)
3.2% ( -8% - 16%)
PKLookup 316.64 (2.5%) 327.40 (3.6%)
3.4% ( -2% - 9%)
Low1MinShouldMatch0 30.13 (1.6%) 31.15 (12.8%)
3.4% ( -10% - 18%)
Low2MinShouldMatch0 35.75 (1.8%) 37.01 (12.6%)
3.5% ( -10% - 18%)
HighMinShouldMatch0 25.94 (1.4%) 26.90 (13.0%)
3.7% ( -10% - 18%)
Low3MinShouldMatch2 39.56 (10.3%) 47.62 (13.0%)
20.4% ( -2% - 48%)
HighMinShouldMatch4 22.28 (10.0%) 27.59 (15.5%)
23.8% ( -1% - 54%)
Low1MinShouldMatch3 22.25 (10.5%) 31.02 (16.4%)
39.4% ( 11% - 74%)
Low2MinShouldMatch2 23.24 (10.3%) 35.63 (17.5%)
53.3% ( 23% - 90%)
HighMinShouldMatch3 16.31 (10.0%) 26.31 (19.6%)
61.3% ( 28% - 101%)
Low1MinShouldMatch2 17.24 (9.7%) 30.30 (21.2%)
75.8% ( 40% - 118%)
HighMinShouldMatch2 13.98 (9.0%) 26.28 (23.2%)
88.0% ( 51% - 132%)
{noformat}
This time we have the slow queries that become faster but also the fast queries
that become slower.
* Queries with minShouldMatch=0 seem to be faster only because BooleanScorer
is used for more queries which seems to make the JVM happy (if I modify the
patch to stop using BooleanScorer when minShouldMatch > 0, it's not the case
anymore)
* On the other hand queries like Low4MinShouldMatch4 are slower. I tried to
revert MinShouldMatchSumScorer to the previous impl and got similar results. It
seems like the fact that MinShouldMatchSumScorer is not used for most queries
in this benchmark anymore make the JVM unhappy
> MinShouldMatchSumScorer should advance less and score lazily
> ------------------------------------------------------------
>
> Key: LUCENE-6201
> URL: https://issues.apache.org/jira/browse/LUCENE-6201
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6201.patch, LUCENE-6201.patch
>
>
> MinShouldMatchSumScorer currently computes the score eagerly, even on
> documents that do not eventually match if it cannot find {{minShouldMatch}}
> matches on the same document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]