[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Thu, 14 Aug 2014 05:48:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096924#comment-14096924
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

I ran the default luceneutil tasks, on full Wikipedia (en) index, with
Java 1.8.0_05.  I use these JVM options to reduce hotspot noise:

{noformat}
  -server -XX:-UseTieredCompilation -Xbatch
{noformat}

And run with 4 GB heap:

{noformat}
Report after iter 19:
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
               OrHighLow       22.04     (13.0%)       20.95      (7.8%)   
-5.0% ( -22% -   18%)
              OrHighHigh       32.92     (12.5%)       31.63      (7.8%)   
-3.9% ( -21% -   18%)
            OrHighNotMed       27.17      (7.3%)       26.41      (7.7%)   
-2.8% ( -16% -   13%)
               OrHighMed       27.43     (14.1%)       26.72      (7.8%)   
-2.6% ( -21% -   22%)
           OrHighNotHigh        7.14      (9.5%)        7.02      (8.1%)   
-1.7% ( -17% -   17%)
             AndHighHigh       24.87      (3.9%)       24.67      (4.7%)   
-0.8% (  -9% -    8%)
               MedPhrase       37.06      (7.8%)       36.78     (11.1%)   
-0.8% ( -18% -   19%)
            OrHighNotLow       48.28     (13.2%)       48.21      (8.1%)   
-0.1% ( -18% -   24%)
        HighSloppyPhrase       10.50      (3.2%)       10.48      (3.9%)   
-0.1% (  -7% -    7%)
         MedSloppyPhrase       46.47      (4.3%)       46.49      (4.0%)    
0.0% (  -7% -    8%)
             MedSpanNear        7.41      (5.2%)        7.43      (5.4%)    
0.2% (  -9% -   11%)
                 LowTerm      347.65      (1.7%)      351.48      (1.7%)    
1.1% (  -2% -    4%)
                  Fuzzy1       65.17     (13.9%)       65.96      (9.0%)    
1.2% ( -18% -   27%)
                  IntNRQ        3.02      (7.9%)        3.06      (2.7%)    
1.3% (  -8% -   12%)
              HighPhrase       41.20      (5.4%)       41.82      (2.5%)    
1.5% (  -6% -    9%)
                  Fuzzy2       35.60     (11.8%)       36.23      (8.5%)    
1.8% ( -16% -   25%)
                PKLookup      202.07      (3.0%)      205.78      (3.6%)    
1.8% (  -4% -    8%)
              AndHighMed       64.22      (8.5%)       65.54      (1.6%)    
2.1% (  -7% -   13%)
                Wildcard       27.13     (11.1%)       27.75      (5.9%)    
2.3% ( -13% -   21%)
                 Respell       38.65     (10.3%)       39.58      (9.1%)    
2.4% ( -15% -   24%)
               LowPhrase       14.73     (10.1%)       15.09      (6.6%)    
2.5% ( -12% -   21%)
            HighSpanNear        3.02      (9.4%)        3.10      (3.9%)    
2.9% (  -9% -   17%)
                 Prefix3       51.57     (11.0%)       53.19      (4.9%)    
3.1% ( -11% -   21%)
                 MedTerm      148.60      (5.1%)      153.51      (2.1%)    
3.3% (  -3% -   11%)
         LowSloppyPhrase       13.82      (7.7%)       14.29      (1.6%)    
3.4% (  -5% -   13%)
             LowSpanNear       14.27      (8.9%)       14.78      (5.8%)    
3.6% ( -10% -   20%)
                HighTerm       23.15      (9.1%)       24.05      (3.8%)    
3.9% (  -8% -   18%)
              AndHighLow      466.12     (16.4%)      501.68     (13.8%)    
7.6% ( -19% -   45%)
           OrNotHighHigh       14.83      (9.1%)       16.04      (5.8%)    
8.1% (  -6% -   25%)
            OrNotHighMed        8.47      (9.7%)       10.20      (4.8%)   
20.4% (   5% -   38%)
            OrNotHighLow       24.83      (9.5%)       30.66      (4.6%)   
23.5% (   8% -   41%)
{noformat}

Looks like OrNot* got faster, but other Or* are maybe a bit slower...


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, merge.perf, merge.png, perf.png, stat.cpp, 
> stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to