[jira] [Commented] (LUCENE-6218) don't decode freqs or enumerate all positions, when scores are not needed

Robert Muir (JIRA) Wed, 04 Feb 2015 10:10:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305637#comment-14305637
 ]


Robert Muir commented on LUCENE-6218:
-------------------------------------

Here is the standard benchmark. You can see the optimization happening for the 
MUST_NOT clauses:
{noformat}
                    Task   QPS trunk      StdDev   QPS patch      StdDev        
        Pct diff
            OrHighNotLow      108.19      (4.1%)      105.11      (6.7%)   
-2.8% ( -13% -    8%)
            OrHighNotMed       89.28      (3.7%)       87.15      (6.3%)   
-2.4% ( -11% -    7%)
                HighTerm      120.82      (5.1%)      118.25      (6.1%)   
-2.1% ( -12% -    9%)
                 MedTerm      177.26      (4.8%)      173.98      (5.8%)   
-1.9% ( -11% -    9%)
                 LowTerm      950.16      (4.4%)      934.26      (4.6%)   
-1.7% ( -10% -    7%)
           OrHighNotHigh       29.55      (3.2%)       29.14      (5.7%)   
-1.4% (  -9% -    7%)
             MedSpanNear      144.83      (3.7%)      143.30      (4.5%)   
-1.1% (  -8% -    7%)
                Wildcard       45.54      (5.3%)       45.17      (6.1%)   
-0.8% ( -11% -   11%)
                 Prefix3      214.45      (5.5%)      213.06      (7.6%)   
-0.6% ( -13% -   13%)
             LowSpanNear       28.04      (2.7%)       27.86      (3.3%)   
-0.6% (  -6% -    5%)
              AndHighLow     1171.37      (2.4%)     1165.20      (3.0%)   
-0.5% (  -5% -    5%)
            HighSpanNear      144.44      (3.9%)      143.73      (5.0%)   
-0.5% (  -9% -    8%)
           OrNotHighHigh       49.49      (3.2%)       49.25      (5.8%)   
-0.5% (  -9% -    8%)
                  IntNRQ        8.45      (7.7%)        8.41     (10.3%)   
-0.5% ( -17% -   19%)
             AndHighHigh       88.18      (1.6%)       87.78      (1.9%)   
-0.5% (  -3% -    3%)
              AndHighMed      123.35      (1.7%)      123.11      (1.8%)   
-0.2% (  -3% -    3%)
                 Respell       89.47      (1.9%)       89.44      (1.4%)   
-0.0% (  -3% -    3%)
                  Fuzzy1      109.20      (1.8%)      109.63      (1.3%)    
0.4% (  -2% -    3%)
                  Fuzzy2       67.56      (2.1%)       67.85      (1.5%)    
0.4% (  -3% -    4%)
               LowPhrase       34.54      (2.0%)       34.76      (1.9%)    
0.6% (  -3% -    4%)
         LowSloppyPhrase      119.91      (2.6%)      120.75      (2.4%)    
0.7% (  -4% -    5%)
              OrHighHigh       27.37      (9.3%)       27.71      (8.6%)    
1.2% ( -15% -   21%)
               OrHighMed       58.23      (8.7%)       58.97      (8.0%)    
1.3% ( -14% -   19%)
               OrHighLow       56.42      (8.7%)       57.23      (7.9%)    
1.4% ( -13% -   19%)
         MedSloppyPhrase       15.92      (4.0%)       16.19      (4.3%)    
1.7% (  -6% -   10%)
        HighSloppyPhrase       13.52     (12.1%)       13.77      (8.6%)    
1.9% ( -16% -   25%)
              HighPhrase       17.50      (4.5%)       17.99      (4.2%)    
2.8% (  -5% -   12%)
               MedPhrase      253.02      (5.7%)      261.32      (6.1%)    
3.3% (  -8% -   15%)
            OrNotHighMed      185.01      (1.9%)      205.45      (3.6%)   
11.0% (   5% -   16%)
            OrNotHighLow      959.96      (2.2%)     1144.49      (3.5%)   
19.2% (  13% -   25%)
{noformat}

> don't decode freqs or enumerate all positions, when scores are not needed
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-6218
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6218
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-6218.patch
>
>
> Today if you don't call score() some things are faster, we won't invoke 
> similarity or read the norm for the document or other things.
> On the other hand, its sad in this case that we are decompressing twice as 
> many packed integers as we need (freqs can be skipped over, and our postings 
> lists supports that) and walking all positions in phrase matching to 
> determine the number of times the phrase matched (1 is enough, then we can 
> stop).
> When scoring is not needed, things can be optimized in other cases too (e.g. 
> thats the whole concept of filters).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6218) don't decode freqs or enumerate all positions, when scores are not needed

Reply via email to