[jira] [Updated] (LUCENE-7386) Flatten nested disjunctions

Adrien Grand (JIRA) Tue, 19 Jul 2016 08:40:07 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7386:
---------------------------------
    Attachment: LUCENE-7386.patch

Here is a patch: scorers are flattened in the case that minShouldMatch==1 and 
scores need to be summed up. luceneutil seems happy with this patch with the 
following patch applied to the tasks file:

{noformat}
diff --git a/tasks/wikimedium.10M.nostopwords.tasks 
b/tasks/wikimedium.10M.nostopwords.tasks
index 342070c..4b36348 100644
--- a/tasks/wikimedium.10M.nostopwords.tasks
+++ b/tasks/wikimedium.10M.nostopwords.tasks
@@ -3735,6 +3735,22 @@ AndHighLow: +2005 +saad # freq=835460 freq=1184
 AndHighLow: +than +sneaks # freq=676864 freq=1291
 AndHighLow: +see +leveling # freq=1044180 freq=943
 AndHighLow: +page +mandel # freq=681036 freq=1866
+OrHighHighHigh: (several following) publisher
+OrHighHighHigh: (2009 film) http
+OrHighHighHigh: (south county) now
+OrHighHighHigh: called (utc until)
+OrHighHighHigh: most (part used)
+OrHighHighHigh: title (2006 references)
+OrHighHighHigh: known (century references)
+OrHighHighHigh: can (against news)
+AndHighOrHighHighHigh: +http (several following) publisher
+AndHighOrHighHighHigh: +now (2009 film) http
+AndHighOrHighHighHigh: +until (south county) now
+AndHighOrHighHighHigh: +used called (utc until)
+AndHighOrHighHighHigh: +references most (part used)
+AndHighOrHighHighHigh: +news title (2006 references)
+AndHighOrHighHighHigh: +several known (century references)
+AndHighOrHighHighHigh: +film can (against news)
 OrHighHigh: several following # freq=436129 freq=416515
 OrHighHigh: publisher end # freq=1289029 freq=526636
 OrHighHigh: 2009 film # freq=887702 freq=432758
{noformat}

The goal of OrHighHighHigh is to test BS1 and AndHighOrHighHighHigh to test BS2.

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  Fuzzy2       73.80     (14.5%)       69.06     (22.5%)   
-6.4% ( -37% -   35%)
                  Fuzzy1       86.33      (8.1%)       82.89      (9.6%)   
-4.0% ( -20% -   14%)
            OrNotHighLow     1204.34      (4.0%)     1188.30      (4.0%)   
-1.3% (  -8% -    6%)
            OrNotHighMed      146.82      (2.7%)      145.94      (2.9%)   
-0.6% (  -6% -    5%)
                 MedTerm      158.21      (6.7%)      157.58      (6.5%)   
-0.4% ( -12% -   13%)
           OrNotHighHigh       67.20      (4.8%)       66.99      (4.4%)   
-0.3% (  -9% -    9%)
            OrHighNotMed      121.66      (8.5%)      121.38      (8.3%)   
-0.2% ( -15% -   18%)
                 Prefix3       36.48      (7.1%)       36.40      (6.8%)   
-0.2% ( -13% -   14%)
            OrHighNotLow      136.63      (9.2%)      136.35      (9.5%)   
-0.2% ( -17% -   20%)
        HighSloppyPhrase       56.20      (6.7%)       56.09      (6.0%)   
-0.2% ( -12% -   13%)
               MedPhrase       47.37      (2.3%)       47.28      (2.4%)   
-0.2% (  -4% -    4%)
               LowPhrase       47.39      (2.2%)       47.31      (2.8%)   
-0.2% (  -5% -    4%)
                 Respell       64.37      (3.1%)       64.26      (3.6%)   
-0.2% (  -6% -    6%)
                Wildcard       39.79      (5.9%)       39.72      (6.0%)   
-0.2% ( -11% -   12%)
                  IntNRQ       11.80     (18.8%)       11.79     (18.6%)   
-0.1% ( -31% -   45%)
             AndHighHigh       81.62      (3.0%)       81.56      (2.6%)   
-0.1% (  -5% -    5%)
            HighSpanNear        9.39      (3.8%)        9.38      (3.2%)   
-0.1% (  -6% -    7%)
             LowSpanNear       17.78      (3.1%)       17.77      (2.9%)   
-0.0% (  -5% -    6%)
             MedSpanNear       11.97      (3.5%)       11.96      (3.1%)   
-0.0% (  -6% -    6%)
                HighTerm      102.38      (6.8%)      102.38      (6.2%)    
0.0% ( -12% -   13%)
               OrHighLow      131.23      (6.6%)      131.26      (6.5%)    
0.0% ( -12% -   13%)
         MedSloppyPhrase       41.51      (4.3%)       41.57      (3.9%)    
0.2% (  -7% -    8%)
         LowSloppyPhrase       16.08      (6.2%)       16.11      (5.7%)    
0.2% ( -11% -   12%)
              HighPhrase       14.70      (2.9%)       14.74      (2.6%)    
0.3% (  -5% -    5%)
              AndHighMed      154.49      (3.4%)      154.97      (2.5%)    
0.3% (  -5% -    6%)
           OrHighNotHigh       50.78      (6.6%)       50.96      (6.6%)    
0.4% ( -12% -   14%)
              AndHighLow      673.89      (4.2%)      677.46      (2.7%)    
0.5% (  -6% -    7%)
                 LowTerm      599.83      (9.3%)      605.98      (9.4%)    
1.0% ( -16% -   21%)
              OrHighHigh       31.23      (5.6%)       31.70      (5.5%)    
1.5% (  -9% -   13%)
               OrHighMed       38.87      (5.3%)       39.60      (5.2%)    
1.9% (  -8% -   13%)
   AndHighOrHighHighHigh       22.74      (3.2%)       24.10      (3.3%)    
6.0% (   0% -   12%)
          OrHighHighHigh        7.47      (5.8%)        8.47      (6.0%)   
13.3% (   1% -   26%)
{noformat}

> Flatten nested disjunctions
> ---------------------------
>
>                 Key: LUCENE-7386
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7386
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7386.patch
>
>
> Now that coords are gone it became easier to flatten nested disjunctions. It 
> might sound weird to write nested disjunctions in the first place, but 
> disjunctions can be created implicitly by other queries such as 
> more-like-this, LatLonPoint.newBoxQuery, non-scoring synonym queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7386) Flatten nested disjunctions

Reply via email to