[jira] [Updated] (LUCENE-7330) Speed up conjunctions

Adrien Grand (JIRA) Fri, 10 Jun 2016 10:17:47 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7330:
---------------------------------
    Attachment: LUCENE-7330.patch

Here is a patch. It speeds up conjunctions thanks to 2 changes:

First it removes the 'if (doc == NO_MORE_DOCS) return NO_MORE_DOCS;' at the top 
of doNext(). This was needed because TwoPhaseConjunctionDISI extended 
ConjunctionDISI and it is illegal to call TwoPhaseIterator.matches() on 
NO_MORE_DOCS. I had to refactor a bit how the two-phase iterator is exposed but 
I don't think it makes things more complicated.

Second, it adds a special case for the second least costly iterator so that we 
do not have to check whether it is already on the same document as the 'lead'. 
If you look at the impl of doNext, we currently have to protect the call to 
'other.advance()' under a 'if (other.docID() < doc)', but we can actually avoid 
it for the 2nd least costly iterator without changing the order in which 
iterators are invoked.

luceneutil reports the following numbers on wikimedium10m, there seems to be a 
noticeable gain for conjunction-based queries (And*, *Span* and *Phrase):

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
            OrHighNotLow      128.17      (9.1%)      126.04      (8.6%)   
-1.7% ( -17% -   17%)
              OrHighHigh       14.75      (6.5%)       14.54      (5.8%)   
-1.4% ( -12% -   11%)
               OrHighMed       66.53      (6.2%)       65.65      (5.8%)   
-1.3% ( -12% -   11%)
               OrHighLow       85.42      (7.3%)       84.51      (6.7%)   
-1.1% ( -14% -   13%)
                  Fuzzy1       68.08     (10.9%)       67.37     (10.2%)   
-1.0% ( -19% -   22%)
            OrHighNotMed      133.66      (8.5%)      132.33      (7.5%)   
-1.0% ( -15% -   16%)
           OrHighNotHigh       64.83      (4.6%)       64.36      (4.3%)   
-0.7% (  -9% -    8%)
            OrNotHighLow     1150.80      (3.1%)     1144.91      (3.4%)   
-0.5% (  -6% -    6%)
                  Fuzzy2       61.60     (22.2%)       61.31     (14.0%)   
-0.5% ( -30% -   46%)
           OrNotHighHigh       22.30      (2.7%)       22.23      (2.6%)   
-0.3% (  -5% -    5%)
            OrNotHighMed      155.90      (2.4%)      155.74      (2.7%)   
-0.1% (  -5% -    5%)
                 Respell       94.52      (1.9%)       94.69      (1.9%)    
0.2% (  -3% -    4%)
                Wildcard       66.04      (4.6%)       66.50      (4.4%)    
0.7% (  -7% -   10%)
                 Prefix3      104.62      (4.7%)      105.54      (4.3%)    
0.9% (  -7% -   10%)
                HighTerm       98.37      (5.3%)       99.65      (4.5%)    
1.3% (  -8% -   11%)
              AndHighLow      612.09      (3.0%)      620.90      (2.6%)    
1.4% (  -3% -    7%)
                 MedTerm      237.97      (4.9%)      241.93      (4.4%)    
1.7% (  -7% -   11%)
                  IntNRQ       18.72      (9.4%)       19.05      (7.7%)    
1.7% ( -13% -   20%)
         LowSloppyPhrase      108.80      (1.7%)      111.16      (2.2%)    
2.2% (  -1% -    6%)
               MedPhrase      100.85      (2.2%)      103.08      (2.1%)    
2.2% (  -2% -    6%)
             MedSpanNear       71.08      (2.2%)       73.09      (2.2%)    
2.8% (  -1% -    7%)
                 LowTerm      623.38      (9.5%)      641.55      (7.7%)    
2.9% ( -12% -   22%)
              HighPhrase       35.36      (3.2%)       36.42      (3.0%)    
3.0% (  -3% -    9%)
             LowSpanNear       92.47      (2.9%)       95.41      (2.8%)    
3.2% (  -2% -    9%)
        HighSloppyPhrase       31.99      (4.9%)       33.09      (4.8%)    
3.5% (  -5% -   13%)
              AndHighMed      223.42      (1.6%)      231.21      (1.9%)    
3.5% (   0% -    7%)
         MedSloppyPhrase       43.07      (2.5%)       45.13      (2.2%)    
4.8% (   0% -    9%)
            HighSpanNear       28.57      (2.9%)       29.95      (3.6%)    
4.8% (  -1% -   11%)
             AndHighHigh       74.55      (1.0%)       78.39      (1.6%)    
5.2% (   2% -    7%)
               LowPhrase       19.97      (2.5%)       21.04      (2.9%)    
5.4% (   0% -   10%)
{noformat}

> Speed up conjunctions
> ---------------------
>
>                 Key: LUCENE-7330
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7330
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7330.patch
>
>
> I am digging into some performance regressions between 4.x and 5.x which seem 
> to be due to how we always run conjunctions with ConjunctionDISI now while 
> 4.x had FilteredQuery, which was optimized for the case that there are only 
> two clauses or that one of the clause supports random access. I'd like to 
> explore the former in this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7330) Speed up conjunctions

Reply via email to