[ 
https://issues.apache.org/jira/browse/LUCENE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7421:
---------------------------------
    Attachment: LUCENE-7421.patch

Here is a patch. I tested it on wikimedium10m by disabling BS1:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
         MedSloppyPhrase       31.52      (5.8%)       31.38      (6.1%)   
-0.4% ( -11% -   12%)
               LowPhrase       79.45      (3.7%)       79.12      (5.3%)   
-0.4% (  -9% -    8%)
            OrNotHighMed      160.15      (3.4%)      159.62      (3.0%)   
-0.3% (  -6% -    6%)
         LowSloppyPhrase       18.74      (6.8%)       18.70      (6.7%)   
-0.2% ( -12% -   14%)
              AndHighLow      571.10      (5.8%)      570.18      (6.4%)   
-0.2% ( -11% -   12%)
                HighTerm       93.87      (7.1%)       93.83      (6.1%)   
-0.1% ( -12% -   14%)
             LowSpanNear      191.42      (4.2%)      191.59      (4.2%)    
0.1% (  -8% -    8%)
            HighSpanNear        2.69      (4.8%)        2.70      (5.4%)    
0.1% (  -9% -   10%)
            OrNotHighLow      766.17      (7.5%)      767.55      (5.4%)    
0.2% ( -11% -   14%)
           OrHighNotHigh       56.81      (4.5%)       56.93      (4.4%)    
0.2% (  -8% -    9%)
                 Respell       63.21      (6.6%)       63.39      (5.7%)    
0.3% ( -11% -   13%)
        HighSloppyPhrase        2.78      (8.4%)        2.79      (8.0%)    
0.4% ( -14% -   18%)
                  IntNRQ       11.20     (19.8%)       11.26     (19.5%)    
0.5% ( -32% -   49%)
                 Prefix3       99.08      (8.3%)       99.59      (6.8%)    
0.5% ( -13% -   17%)
                 MedTerm      224.98      (6.1%)      226.23      (5.4%)    
0.6% ( -10% -   12%)
              AndHighMed      234.21      (3.9%)      235.65      (2.9%)    
0.6% (  -5% -    7%)
                 LowTerm      565.85     (10.8%)      570.49     (11.3%)    
0.8% ( -19% -   25%)
             AndHighHigh       66.68      (4.0%)       67.23      (3.2%)    
0.8% (  -6% -    8%)
             MedSpanNear       55.15      (5.9%)       55.67      (3.6%)    
0.9% (  -8% -   11%)
            OrHighNotLow       75.71      (7.8%)       76.44      (6.3%)    
1.0% ( -12% -   16%)
                Wildcard       15.89      (8.5%)       16.05      (6.9%)    
1.0% ( -13% -   17%)
           OrNotHighHigh       50.83      (5.4%)       51.38      (3.7%)    
1.1% (  -7% -   10%)
               MedPhrase       31.99      (6.5%)       32.41      (2.9%)    
1.3% (  -7% -   11%)
              HighPhrase       23.83      (5.4%)       24.18      (3.6%)    
1.5% (  -7% -   11%)
                  Fuzzy1       39.46      (8.5%)       40.13      (7.0%)    
1.7% ( -12% -   18%)
            OrHighNotMed       70.05      (6.8%)       71.36      (5.6%)    
1.9% (  -9% -   15%)
              OrHighHigh       18.82      (6.0%)       19.57      (4.7%)    
4.0% (  -6% -   15%)
                  Fuzzy2       49.95     (17.2%)       52.28     (17.2%)    
4.7% ( -25% -   47%)
               OrHighMed       18.20      (8.8%)       20.01      (6.9%)    
9.9% (  -5% -   28%)
               OrHighLow       47.39      (7.2%)       52.25      (6.0%)   
10.2% (  -2% -   25%)
{noformat}

All 3 disjunctions got a performance boost, especially those whose clauses have 
very different doc frequencies: OrHighMed and OrHighLow.

> Speed up BS2 by caching the 2nd lowest doc id in the priority queue
> -------------------------------------------------------------------
>
>                 Key: LUCENE-7421
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7421
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7421.patch
>
>
> BS2 uses a priority queue in order to merge several sorted iterators into a 
> new sorted iterator. We call updateTop every time that we move the 'top' 
> iterator forward, which requires to check the size of the priority queue at 
> least twice and perform at least two comparisons of doc ids.
> Instead, DisjunctionSumScorer could cache the 2nd lowest doc id of the 
> priority queue and only call updateTop when the doc id of the entry at the 
> top of the priority queue goes beyong the 2nd lowest doc id. While this would 
> involve slightly more work in the case that the PQ has two high-cardinality 
> clauses whose docs are interleaved, this would help when one clause has a 
> much higher cardinality than the other ones or when the doc ids of the 
> various clauses are clustered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to