[jira] [Updated] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Adrien Grand (JIRA) Fri, 13 Oct 2017 01:02:19 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7993:
---------------------------------
    Attachment: LUCENE-7993.patch

Here is a patch that applies on top of LUCENE-4100 to show the idea. Luceneutil 
confirms it brings interesting gains on wikimedium10m:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
            OrHighNotLow       88.30      (4.4%)       72.67      (2.4%)  
-17.7% ( -23% -  -11%)
            OrHighNotMed       93.18      (3.3%)       86.58      (1.9%)   
-7.1% ( -11% -   -1%)
            OrNotHighLow     1386.80      (4.0%)     1289.38      (3.3%)   
-7.0% ( -13% -    0%)
           OrHighNotHigh       49.84      (3.2%)       47.59      (1.7%)   
-4.5% (  -9% -    0%)
                  Fuzzy2      196.79     (16.6%)      188.44      (7.7%)   
-4.2% ( -24% -   24%)
            HighSpanNear       58.01      (2.2%)       56.18      (2.4%)   
-3.2% (  -7% -    1%)
            OrNotHighMed      184.60      (1.7%)      178.77      (2.4%)   
-3.2% (  -7% -    0%)
              AndHighMed      224.60      (1.9%)      217.95      (2.3%)   
-3.0% (  -7% -    1%)
             LowSpanNear      143.79      (2.4%)      139.98      (2.4%)   
-2.7% (  -7% -    2%)
                  IntNRQ       19.47      (4.2%)       19.13      (5.0%)   
-1.8% ( -10% -    7%)
                 MedTerm      248.95      (2.3%)      244.80      (1.9%)   
-1.7% (  -5% -    2%)
                 LowTerm      766.37      (3.6%)      758.11      (3.9%)   
-1.1% (  -8% -    6%)
                HighTerm      131.14      (2.5%)      129.74      (2.6%)   
-1.1% (  -5% -    4%)
             AndHighHigh       30.70      (2.4%)       30.40      (1.5%)   
-1.0% (  -4% -    3%)
           OrNotHighHigh       55.99      (2.7%)       55.50      (1.7%)   
-0.9% (  -5% -    3%)
                 Prefix3      105.33      (4.8%)      104.60      (3.6%)   
-0.7% (  -8% -    8%)
             MedSpanNear       13.38      (2.3%)       13.30      (2.1%)   
-0.6% (  -4% -    3%)
                Wildcard       84.93      (4.8%)       84.59      (3.7%)   
-0.4% (  -8% -    8%)
              AndHighLow     1419.89      (3.3%)     1432.43      (2.8%)    
0.9% (  -4% -    7%)
         LowSloppyPhrase       38.50      (3.0%)       39.02      (1.7%)    
1.3% (  -3% -    6%)
        HighSloppyPhrase       15.85      (4.2%)       16.10      (2.4%)    
1.6% (  -4% -    8%)
         MedSloppyPhrase      118.20      (3.8%)      120.36      (2.4%)    
1.8% (  -4% -    8%)
                 Respell      272.44      (6.5%)      279.22      (3.5%)    
2.5% (  -7% -   13%)
       HighTermMonthSort      226.59      (9.1%)      233.94      (9.1%)    
3.2% ( -13% -   23%)
                  Fuzzy1      163.36     (10.6%)      171.95      (8.7%)    
5.3% ( -12% -   27%)
               LowPhrase      195.93      (2.2%)      222.77      (2.2%)   
13.7% (   9% -   18%)
              OrHighHigh       34.58      (6.4%)       45.87      (6.8%)   
32.6% (  18% -   49%)
   HighTermDayOfYearSort       65.42      (6.6%)       87.68     (12.5%)   
34.0% (  14% -   56%)
               MedPhrase       40.05      (2.0%)       59.16      (2.3%)   
47.7% (  42% -   53%)
               OrHighMed       41.35      (6.0%)       64.85      (7.3%)   
56.8% (  41% -   74%)
              HighPhrase       22.51      (3.8%)       39.33      (4.0%)   
74.8% (  64% -   85%)
               OrHighLow       61.15      (3.2%)      629.98     (41.3%)  
930.3% ( 858% - 1007%)
{noformat}

Changes to the performance of disjunctions are thanks to MAXSCORE, however we 
can see that {{LowPhrase}} (+13.7%), {{MedPhrase}} (+47.7%) and {{HighPhrase}} 
(+74.8%) have good speedups too.

> Speed up phrase queries when total hit count is not needed
> ----------------------------------------------------------
>
>                 Key: LUCENE-7993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7993
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7993.patch
>
>
> Follow-up of LUCENE-4100: When thinking about the API that we needed to 
> introduce to support MAXSCORE, I wondered whether the same API could support 
> other optimizations. The idea is that when running phrase queries, before we 
> start reading positions, we already have access to the term frequency of each 
> term. And the frequency of the phrase is bounded by the minimum term 
> frequency of the involved terms. So if the score for that minimum term 
> frequency is not competitive then it means that the score for the phrase is 
> not competitive either if we can assume that the score increases (or 
> stagnates) when the term freq increases, which sounds like an ok requirement 
> for a sane Similarity?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Reply via email to