[jira] [Commented] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Adrien Grand (JIRA) Thu, 26 Oct 2017 09:42:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220743#comment-16220743
 ]


Adrien Grand commented on LUCENE-7993:
--------------------------------------

Benchmarks on wikibig this time, which is more appropriate since artificially 
truncated documents defeat the purpose of this optimization. HighPrase is now 
3x faster.

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
              OrHighHigh       97.15      (3.7%)       85.83      (3.6%)  
-11.7% ( -18% -   -4%)
                  Fuzzy2      142.85      (8.7%)      131.63     (11.0%)   
-7.9% ( -25% -   12%)
                  Fuzzy1      216.22      (9.6%)      200.10      (8.1%)   
-7.5% ( -22% -   11%)
         MedSloppyPhrase        8.02      (7.4%)        7.78     (10.1%)   
-3.0% ( -19% -   15%)
        HighSloppyPhrase       31.23      (5.7%)       30.59      (7.7%)   
-2.0% ( -14% -   12%)
             MedSpanNear      124.68      (4.7%)      122.26      (4.7%)   
-1.9% ( -10% -    7%)
             LowSpanNear       34.39      (8.2%)       33.90      (8.0%)   
-1.4% ( -16% -   16%)
         LowSloppyPhrase       27.55      (5.1%)       27.28      (6.8%)   
-1.0% ( -12% -   11%)
                  IntNRQ      164.57      (7.2%)      163.10      (8.5%)   
-0.9% ( -15% -   16%)
            HighSpanNear       48.43      (4.5%)       48.03      (4.2%)   
-0.8% (  -9% -    8%)
                 Respell      226.20      (3.1%)      225.11      (4.7%)   
-0.5% (  -8% -    7%)
              AndHighLow     1211.79      (3.9%)     1211.37      (3.1%)   
-0.0% (  -6% -    7%)
              AndHighMed      130.59      (2.0%)      130.71      (1.8%)    
0.1% (  -3% -    3%)
       HighTermMonthSort      307.88      (7.8%)      308.47      (8.4%)    
0.2% ( -14% -   17%)
                 MedTerm      361.52      (2.9%)      362.23      (2.8%)    
0.2% (  -5% -    6%)
             AndHighHigh      114.80      (1.9%)      115.38      (1.8%)    
0.5% (  -3% -    4%)
                 Prefix3      248.47      (5.0%)      249.86      (5.3%)    
0.6% (  -9% -   11%)
                HighTerm      201.95      (2.9%)      203.53      (2.9%)    
0.8% (  -4% -    6%)
                Wildcard      224.17      (4.4%)      226.12      (3.9%)    
0.9% (  -7% -    9%)
                 LowTerm     1862.62      (3.6%)     1903.87      (4.2%)    
2.2% (  -5% -   10%)
               OrHighMed      106.09      (4.6%)      145.10      (5.5%)   
36.8% (  25% -   49%)
               LowPhrase       81.86      (5.9%)      112.43      (3.5%)   
37.4% (  26% -   49%)
   HighTermDayOfYearSort      227.00      (7.3%)      312.89     (10.6%)   
37.8% (  18% -   60%)
               MedPhrase       17.95     (14.2%)       43.93     (15.1%)  
144.7% ( 101% -  202%)
              HighPhrase       29.28      (7.5%)       87.43      (8.6%)  
198.6% ( 169% -  231%)
               OrHighLow      110.21      (3.9%)      835.01     (34.0%)  
657.6% ( 596% -  723%)
{noformat}

> Speed up phrase queries when total hit count is not needed
> ----------------------------------------------------------
>
>                 Key: LUCENE-7993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7993
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7993.patch
>
>
> Follow-up of LUCENE-4100: When thinking about the API that we needed to 
> introduce to support MAXSCORE, I wondered whether the same API could support 
> other optimizations. The idea is that when running phrase queries, before we 
> start reading positions, we already have access to the term frequency of each 
> term. And the frequency of the phrase is bounded by the minimum term 
> frequency of the involved terms. So if the score for that minimum term 
> frequency is not competitive then it means that the score for the phrase is 
> not competitive either if we can assume that the score increases (or 
> stagnates) when the term freq increases, which sounds like an ok requirement 
> for a sane Similarity?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Reply via email to