[
https://issues.apache.org/jira/browse/LUCENE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7386:
---------------------------------
Attachment: LUCENE-7386.patch
Here is a patch: scorers are flattened in the case that minShouldMatch==1 and
scores need to be summed up. luceneutil seems happy with this patch with the
following patch applied to the tasks file:
{noformat}
diff --git a/tasks/wikimedium.10M.nostopwords.tasks
b/tasks/wikimedium.10M.nostopwords.tasks
index 342070c..4b36348 100644
--- a/tasks/wikimedium.10M.nostopwords.tasks
+++ b/tasks/wikimedium.10M.nostopwords.tasks
@@ -3735,6 +3735,22 @@ AndHighLow: +2005 +saad # freq=835460 freq=1184
AndHighLow: +than +sneaks # freq=676864 freq=1291
AndHighLow: +see +leveling # freq=1044180 freq=943
AndHighLow: +page +mandel # freq=681036 freq=1866
+OrHighHighHigh: (several following) publisher
+OrHighHighHigh: (2009 film) http
+OrHighHighHigh: (south county) now
+OrHighHighHigh: called (utc until)
+OrHighHighHigh: most (part used)
+OrHighHighHigh: title (2006 references)
+OrHighHighHigh: known (century references)
+OrHighHighHigh: can (against news)
+AndHighOrHighHighHigh: +http (several following) publisher
+AndHighOrHighHighHigh: +now (2009 film) http
+AndHighOrHighHighHigh: +until (south county) now
+AndHighOrHighHighHigh: +used called (utc until)
+AndHighOrHighHighHigh: +references most (part used)
+AndHighOrHighHighHigh: +news title (2006 references)
+AndHighOrHighHighHigh: +several known (century references)
+AndHighOrHighHighHigh: +film can (against news)
OrHighHigh: several following # freq=436129 freq=416515
OrHighHigh: publisher end # freq=1289029 freq=526636
OrHighHigh: 2009 film # freq=887702 freq=432758
{noformat}
The goal of OrHighHighHigh is to test BS1 and AndHighOrHighHighHigh to test BS2.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Fuzzy2 73.80 (14.5%) 69.06 (22.5%)
-6.4% ( -37% - 35%)
Fuzzy1 86.33 (8.1%) 82.89 (9.6%)
-4.0% ( -20% - 14%)
OrNotHighLow 1204.34 (4.0%) 1188.30 (4.0%)
-1.3% ( -8% - 6%)
OrNotHighMed 146.82 (2.7%) 145.94 (2.9%)
-0.6% ( -6% - 5%)
MedTerm 158.21 (6.7%) 157.58 (6.5%)
-0.4% ( -12% - 13%)
OrNotHighHigh 67.20 (4.8%) 66.99 (4.4%)
-0.3% ( -9% - 9%)
OrHighNotMed 121.66 (8.5%) 121.38 (8.3%)
-0.2% ( -15% - 18%)
Prefix3 36.48 (7.1%) 36.40 (6.8%)
-0.2% ( -13% - 14%)
OrHighNotLow 136.63 (9.2%) 136.35 (9.5%)
-0.2% ( -17% - 20%)
HighSloppyPhrase 56.20 (6.7%) 56.09 (6.0%)
-0.2% ( -12% - 13%)
MedPhrase 47.37 (2.3%) 47.28 (2.4%)
-0.2% ( -4% - 4%)
LowPhrase 47.39 (2.2%) 47.31 (2.8%)
-0.2% ( -5% - 4%)
Respell 64.37 (3.1%) 64.26 (3.6%)
-0.2% ( -6% - 6%)
Wildcard 39.79 (5.9%) 39.72 (6.0%)
-0.2% ( -11% - 12%)
IntNRQ 11.80 (18.8%) 11.79 (18.6%)
-0.1% ( -31% - 45%)
AndHighHigh 81.62 (3.0%) 81.56 (2.6%)
-0.1% ( -5% - 5%)
HighSpanNear 9.39 (3.8%) 9.38 (3.2%)
-0.1% ( -6% - 7%)
LowSpanNear 17.78 (3.1%) 17.77 (2.9%)
-0.0% ( -5% - 6%)
MedSpanNear 11.97 (3.5%) 11.96 (3.1%)
-0.0% ( -6% - 6%)
HighTerm 102.38 (6.8%) 102.38 (6.2%)
0.0% ( -12% - 13%)
OrHighLow 131.23 (6.6%) 131.26 (6.5%)
0.0% ( -12% - 13%)
MedSloppyPhrase 41.51 (4.3%) 41.57 (3.9%)
0.2% ( -7% - 8%)
LowSloppyPhrase 16.08 (6.2%) 16.11 (5.7%)
0.2% ( -11% - 12%)
HighPhrase 14.70 (2.9%) 14.74 (2.6%)
0.3% ( -5% - 5%)
AndHighMed 154.49 (3.4%) 154.97 (2.5%)
0.3% ( -5% - 6%)
OrHighNotHigh 50.78 (6.6%) 50.96 (6.6%)
0.4% ( -12% - 14%)
AndHighLow 673.89 (4.2%) 677.46 (2.7%)
0.5% ( -6% - 7%)
LowTerm 599.83 (9.3%) 605.98 (9.4%)
1.0% ( -16% - 21%)
OrHighHigh 31.23 (5.6%) 31.70 (5.5%)
1.5% ( -9% - 13%)
OrHighMed 38.87 (5.3%) 39.60 (5.2%)
1.9% ( -8% - 13%)
AndHighOrHighHighHigh 22.74 (3.2%) 24.10 (3.3%)
6.0% ( 0% - 12%)
OrHighHighHigh 7.47 (5.8%) 8.47 (6.0%)
13.3% ( 1% - 26%)
{noformat}
> Flatten nested disjunctions
> ---------------------------
>
> Key: LUCENE-7386
> URL: https://issues.apache.org/jira/browse/LUCENE-7386
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7386.patch
>
>
> Now that coords are gone it became easier to flatten nested disjunctions. It
> might sound weird to write nested disjunctions in the first place, but
> disjunctions can be created implicitly by other queries such as
> more-like-this, LatLonPoint.newBoxQuery, non-scoring synonym queries, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]