[jira] [Commented] (LUCENE-6201) MinShouldMatchSumScorer should advance less and score lazily

Adrien Grand (JIRA) Thu, 29 Jan 2015 08:27:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297093#comment-14297093
 ]


Adrien Grand commented on LUCENE-6201:
--------------------------------------

Hmm, actually I played with it and the results are not good. I modified the 
patch to make pure conjunctions handled as disjunctions with 
minShouldMatch=number_of_optional_clauses. It is using the same heuristics as 
in the current patch (cost > maxDoc / 3) so only AndHighHigh runs with 
BooleanScorer:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
             AndHighHigh       85.24      (1.8%)       57.61      (8.0%)  
-32.4% ( -41% -  -23%)
                  Fuzzy2       65.07     (12.3%)       59.37     (12.7%)   
-8.8% ( -30% -   18%)
                  Fuzzy1       41.38      (3.4%)       39.09      (5.1%)   
-5.5% ( -13% -    3%)
              AndHighLow      781.86      (3.7%)      753.32      (3.7%)   
-3.6% ( -10% -    3%)
        HighSloppyPhrase       36.01      (2.9%)       35.20      (3.5%)   
-2.2% (  -8% -    4%)
               OrHighLow       92.94      (4.4%)       91.18      (5.8%)   
-1.9% ( -11% -    8%)
         MedSloppyPhrase       64.84      (3.0%)       63.67      (3.3%)   
-1.8% (  -7% -    4%)
              AndHighMed      197.21      (2.0%)      194.06      (3.3%)   
-1.6% (  -6% -    3%)
         LowSloppyPhrase       97.46      (2.0%)       96.00      (2.2%)   
-1.5% (  -5% -    2%)
               OrHighMed       41.52      (4.8%)       41.15      (6.6%)   
-0.9% ( -11% -   11%)
              HighPhrase       35.18      (7.3%)       34.97      (7.7%)   
-0.6% ( -14% -   15%)
              OrHighHigh       25.83      (5.3%)       25.69      (6.9%)   
-0.6% ( -12% -   12%)
               MedPhrase       31.25      (3.4%)       31.17      (3.8%)   
-0.3% (  -7% -    7%)
           OrHighNotHigh       49.00      (1.9%)       48.87      (1.7%)   
-0.3% (  -3% -    3%)
                 MedTerm      246.68      (2.2%)      246.06      (2.2%)   
-0.3% (  -4% -    4%)
            OrHighNotMed       40.13      (2.9%)       40.05      (2.8%)   
-0.2% (  -5% -    5%)
                HighTerm       95.31      (2.3%)       95.20      (2.2%)   
-0.1% (  -4% -    4%)
                  IntNRQ        7.70      (4.7%)        7.69      (4.9%)   
-0.1% (  -9% -    9%)
           OrNotHighHigh       29.34      (1.4%)       29.32      (2.2%)   
-0.1% (  -3% -    3%)
                Wildcard       59.61      (2.9%)       59.60      (3.1%)   
-0.0% (  -5% -    6%)
             LowSpanNear      115.48      (2.6%)      115.51      (2.4%)    
0.0% (  -4% -    5%)
               LowPhrase      368.22      (3.1%)      368.37      (3.4%)    
0.0% (  -6% -    6%)
                 Prefix3       97.37      (3.7%)       97.52      (3.9%)    
0.2% (  -7% -    8%)
                 LowTerm      868.88      (4.8%)      870.53      (6.6%)    
0.2% ( -10% -   12%)
                PKLookup      269.02      (3.2%)      269.55      (3.4%)    
0.2% (  -6% -    7%)
            OrNotHighMed      113.70      (2.0%)      114.02      (2.5%)    
0.3% (  -4% -    4%)
             MedSpanNear      109.45      (1.8%)      109.80      (1.8%)    
0.3% (  -3% -    3%)
            OrHighNotLow      103.81      (3.1%)      104.17      (2.6%)    
0.3% (  -5% -    6%)
            HighSpanNear       10.89      (2.1%)       10.94      (1.9%)    
0.4% (  -3% -    4%)
                 Respell       86.63      (6.1%)       87.09      (6.1%)    
0.5% ( -10% -   13%)
            OrNotHighLow      967.00      (4.1%)     1019.09      (3.3%)    
5.4% (  -1% -   13%)
{noformat}

BooleanScorer is not ready to run pure conjunctions. Actually this probably 
makes sense: one of the reason why disjunctions (with or without 
minShouldMatch) are slow is that they need to update some priority queues all 
the time in order to figure out the next candidate. BooleanScorer speeds things 
up by only balancing priority queues once per window. On the other hand, the 
ConjunctionScorer doesn't have such issues, so BooleanScorer mostly adds 
overhead by first collecting into a bit set and then recollecting from the bit 
set into the actual collector.

> MinShouldMatchSumScorer should advance less and score lazily
> ------------------------------------------------------------
>
>                 Key: LUCENE-6201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6201
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6201.patch, LUCENE-6201.patch
>
>
> MinShouldMatchSumScorer currently computes the score eagerly, even on 
> documents that do not eventually match if it cannot find {{minShouldMatch}} 
> matches on the same document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6201) MinShouldMatchSumScorer should advance less and score lazily

Reply via email to