[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Mon, 18 Aug 2014 02:51:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100492#comment-14100492
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

OK indeed I see effectively no perf diffs for the default tasks:

{noformat}


Report after iter 19:
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 LowTerm      159.50     (18.8%)      157.56     (17.0%)   
-1.2% ( -31% -   42%)
               LowPhrase        9.13      (2.3%)        9.10      (3.0%)   
-0.3% (  -5% -    5%)
              HighPhrase       22.96      (3.2%)       22.89      (4.0%)   
-0.3% (  -7% -    7%)
               MedPhrase       20.96      (2.6%)       20.91      (3.5%)   
-0.2% (  -6% -    6%)
         LowSloppyPhrase        9.01      (4.2%)        9.02      (4.3%)    
0.1% (  -8% -    8%)
                  Fuzzy1       34.93      (4.3%)       34.96      (5.2%)    
0.1% (  -9% -   10%)
                 Respell       23.59      (2.9%)       23.64      (2.9%)    
0.2% (  -5% -    6%)
         MedSloppyPhrase       27.69      (5.1%)       27.76      (4.8%)    
0.3% (  -9% -   10%)
        HighSloppyPhrase        6.39      (6.3%)        6.41      (6.4%)    
0.3% ( -11% -   13%)
              AndHighMed       39.17      (1.9%)       39.30      (2.1%)    
0.4% (  -3% -    4%)
                 MedTerm       76.73      (9.0%)       77.02      (8.6%)    
0.4% ( -15% -   19%)
             AndHighHigh       15.19      (1.6%)       15.26      (2.4%)    
0.4% (  -3% -    4%)
             MedSpanNear        4.14      (4.7%)        4.16      (5.7%)    
0.4% (  -9% -   11%)
            HighSpanNear        1.49      (3.3%)        1.50      (4.6%)    
0.5% (  -7% -    8%)
             LowSpanNear        8.60      (6.0%)        8.67      (7.5%)    
0.8% ( -11% -   15%)
                HighTerm       13.12      (8.6%)       13.24     (10.1%)    
0.9% ( -16% -   21%)
               OrHighMed       15.47      (6.3%)       15.62      (6.0%)    
0.9% ( -10% -   14%)
           OrNotHighHigh        8.61      (7.2%)        8.70      (6.9%)    
1.1% ( -12% -   16%)
            OrHighNotLow       26.60      (5.8%)       26.95      (5.9%)    
1.3% (  -9% -   13%)
            OrHighNotMed       14.53      (6.6%)       14.72      (6.1%)    
1.3% ( -10% -   15%)
               OrHighLow       12.25      (6.5%)       12.42      (6.9%)    
1.4% ( -11% -   15%)
           OrHighNotHigh        4.06      (7.3%)        4.12      (6.5%)    
1.4% ( -11% -   16%)
                 Prefix3       30.14      (3.5%)       30.58      (4.2%)    
1.4% (  -6% -    9%)
              OrHighHigh       18.13      (6.1%)       18.40      (6.1%)    
1.5% ( -10% -   14%)
            OrNotHighLow       14.43      (7.6%)       14.65      (7.6%)    
1.5% ( -12% -   18%)
                Wildcard       15.10      (4.2%)       15.34      (6.2%)    
1.6% (  -8% -   12%)
                  Fuzzy2       20.01      (4.0%)       20.39      (3.7%)    
1.9% (  -5% -   10%)
              AndHighLow      278.92      (3.2%)      284.79      (3.7%)    
2.1% (  -4% -    9%)
            OrNotHighMed        5.10      (7.8%)        5.22      (7.6%)    
2.2% ( -12% -   19%)
                  IntNRQ        1.63      (5.7%)        1.69     (10.3%)    
3.7% ( -11% -   20%)
{noformat}

I'll run with And.tasks next...

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, 
> LUCENE-4396-simple.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, 
> merge-simple.perf, merge-simple.png, merge.perf, merge.png, perf.png, 
> stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to