[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Sun, 17 Aug 2014 10:16:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099990#comment-14099990
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

Thanks Da, new patch applies cleanly!

But I still see some perf hit to pure OR queries:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
               OrHighMed       29.64      (7.1%)       25.78     (11.6%)  
-13.0% ( -29% -    6%)
               OrHighLow       23.40      (7.5%)       20.74      (9.4%)  
-11.4% ( -26% -    5%)
              OrHighHigh       35.16      (7.2%)       31.26      (9.3%)  
-11.1% ( -25% -    5%)
            OrHighNotLow       52.50      (7.2%)       47.38      (9.4%)   
-9.7% ( -24% -    7%)
            OrHighNotMed       27.99      (7.2%)       26.08      (9.3%)   
-6.8% ( -21% -   10%)
                  Fuzzy2       38.75      (8.7%)       36.26      (7.6%)   
-6.4% ( -20% -   10%)
           OrHighNotHigh        7.44      (6.7%)        7.01      (8.7%)   
-5.8% ( -19% -   10%)
                  Fuzzy1       69.64      (4.9%)       67.24      (5.3%)   
-3.4% ( -12% -    7%)
                 LowTerm      357.08      (2.9%)      347.21      (5.9%)   
-2.8% ( -11% -    6%)
            HighSpanNear        3.11      (2.4%)        3.04      (9.0%)   
-2.3% ( -13% -    9%)
                 Respell       40.87      (4.4%)       39.96      (6.2%)   
-2.2% ( -12% -    8%)
               LowPhrase       15.36      (2.2%)       15.02      (8.5%)   
-2.2% ( -12% -    8%)
                 MedTerm      154.50      (2.8%)      153.13      (1.5%)   
-0.9% (  -5% -    3%)
              AndHighLow      536.51      (6.3%)      532.65     (10.5%)   
-0.7% ( -16% -   17%)
                HighTerm       24.23      (3.4%)       24.06      (1.7%)   
-0.7% (  -5% -    4%)
             AndHighHigh       25.25      (1.0%)       25.15      (1.0%)   
-0.4% (  -2% -    1%)
             MedSpanNear        7.36      (3.9%)        7.33      (4.1%)   
-0.4% (  -8% -    7%)
               MedPhrase       38.32      (1.9%)       38.17      (2.2%)   
-0.4% (  -4% -    3%)
              HighPhrase       42.07      (2.3%)       41.98      (2.4%)   
-0.2% (  -4% -    4%)
              AndHighMed       66.56      (1.7%)       66.44      (1.8%)   
-0.2% (  -3% -    3%)
                  IntNRQ        3.09      (2.1%)        3.09      (2.4%)   
-0.1% (  -4% -    4%)
                Wildcard       28.08      (2.2%)       28.06      (1.8%)   
-0.1% (  -4% -    4%)
             LowSpanNear       14.61      (4.4%)       14.61      (4.6%)    
0.0% (  -8% -    9%)
         LowSloppyPhrase       14.17      (2.0%)       14.19      (1.7%)    
0.1% (  -3% -    3%)
        HighSloppyPhrase       10.47      (3.0%)       10.49      (3.0%)    
0.2% (  -5% -    6%)
                 Prefix3       53.87      (4.3%)       54.21      (3.2%)    
0.6% (  -6% -    8%)
         MedSloppyPhrase       45.63      (9.0%)       46.57      (4.6%)    
2.1% ( -10% -   17%)
           OrNotHighHigh       15.44      (6.3%)       16.02      (6.9%)    
3.8% (  -8% -   18%)
            OrNotHighLow       25.75      (6.1%)       29.99     (10.2%)   
16.5% (   0% -   34%)
            OrNotHighMed        8.73      (8.9%)       10.20      (5.1%)   
16.9% (   2% -   33%)
{noformat}

Unfortunately, I've found hotspot to be very finicky about changes to 
BooleanScorer in the past; I'm not sure why.  Maybe, we have to not touch 
BooleanScorer here (meaning it can't accept MUST clauses) and just direct all 
appropriate (according to the switching criteria) queries with MUST clauses to 
BAS?

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, merge-simple.perf, merge-simple.png, 
> merge.perf, merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to