[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Sun, 17 Aug 2014 20:52:53 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100255#comment-14100255
 ]


Da Huang commented on LUCENE-4396:
----------------------------------

I've tested again with the setup exactly same as mike's.
Here's the performance.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
            HighSpanNear        1.05      (2.1%)        1.04      (2.1%)   
-1.6% (  -5% -    2%)
        HighSloppyPhrase        3.83      (5.3%)        3.78      (4.9%)   
-1.3% ( -10% -    9%)
                 LowTerm       78.04      (4.5%)       77.13      (4.5%)   
-1.2% (  -9% -    8%)
             MedSpanNear        2.89      (3.9%)        2.86      (3.3%)   
-1.1% (  -8% -    6%)
             LowSpanNear        5.91      (4.9%)        5.84      (4.2%)   
-1.1% (  -9% -    8%)
                HighTerm        8.02     (12.1%)        7.94     (11.4%)   
-1.0% ( -21% -   25%)
             AndHighHigh        9.84      (1.9%)        9.74      (2.4%)   
-1.0% (  -5% -    3%)
                 MedTerm       30.63      (4.7%)       30.35      (5.1%)   
-0.9% ( -10% -    9%)
         LowSloppyPhrase        5.83      (4.4%)        5.79      (4.5%)   
-0.7% (  -9% -    8%)
         MedSloppyPhrase       16.86      (4.5%)       16.75      (4.3%)   
-0.6% (  -9% -    8%)
               OrHighMed        7.57      (4.5%)        7.55      (4.1%)   
-0.3% (  -8% -    8%)
            OrNotHighLow        7.87      (5.3%)        7.84      (5.3%)   
-0.3% ( -10% -   10%)
              AndHighMed       25.10      (3.1%)       25.05      (3.7%)   
-0.2% (  -6% -    6%)
                  Fuzzy2       10.80      (2.7%)       10.78      (2.9%)   
-0.1% (  -5% -    5%)
              OrHighHigh        8.75      (4.4%)        8.74      (4.1%)   
-0.1% (  -8% -    8%)
            OrHighNotMed        7.33      (4.4%)        7.33      (4.0%)   
-0.1% (  -8% -    8%)
           OrNotHighHigh        4.84      (5.1%)        4.84      (5.0%)   
-0.1% (  -9% -   10%)
               OrHighLow        6.67      (4.6%)        6.66      (4.5%)   
-0.1% (  -8% -    9%)
            OrNotHighMed        2.90      (5.2%)        2.89      (5.2%)   
-0.1% ( -10% -   10%)
           OrHighNotHigh        2.32      (4.9%)        2.32      (4.6%)   
-0.0% (  -9% -    9%)
                  Fuzzy1       20.35      (3.1%)       20.38      (3.4%)    
0.1% (  -6% -    6%)
            OrHighNotLow       13.54      (4.5%)       13.56      (4.2%)    
0.2% (  -8% -    9%)
               MedPhrase       11.75      (3.2%)       11.78      (2.4%)    
0.2% (  -5% -    5%)
               LowPhrase        6.08      (2.9%)        6.09      (2.7%)    
0.2% (  -5% -    6%)
              HighPhrase       13.25      (3.8%)       13.29      (3.4%)    
0.3% (  -6% -    7%)
                 Prefix3       19.78      (3.2%)       19.85      (3.9%)    
0.4% (  -6% -    7%)
                 Respell       15.13      (3.1%)       15.19      (3.7%)    
0.4% (  -6% -    7%)
                Wildcard        8.82      (3.3%)        8.89      (4.9%)    
0.8% (  -7% -    9%)
                  IntNRQ        0.85      (4.2%)        0.86      (6.0%)    
1.3% (  -8% -   12%)
              AndHighLow      172.85      (4.9%)      175.57      (4.7%)    
1.6% (  -7% -   11%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, LUCENE-4396-simple.patch, 
> LUCENE-4396-simple.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, 
> merge-simple.perf, merge-simple.png, merge.perf, merge.png, perf.png, 
> stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to