[jira] [Updated] (LUCENE-6244) Approximations on disjunctions

Adrien Grand (JIRA) Sat, 14 Feb 2015 07:51:39 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6244:
---------------------------------
    Attachment: wikibig.tasks
                LUCENE-6244.patch

I agree it would be important that our benchmarks track the performance of BS2 
as this scorer is probably used pretty often!

I worked a bit more on the patch in order to get back some performance. Because 
things are structured differently, I lost the feature that we confirm at most 
one clause per doc, but at least performance on simple queries is back (with 
BS1 disabled this time):

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                 Respell       70.77      (3.2%)       69.96      (5.6%)   
-1.1% (  -9% -    7%)
                  Fuzzy2       57.49      (7.8%)       57.03     (10.1%)   
-0.8% ( -17% -   18%)
             AndHighHigh       90.13      (1.7%)       89.61      (2.1%)   
-0.6% (  -4% -    3%)
                  IntNRQ        7.32      (5.2%)        7.28      (5.3%)   
-0.5% ( -10% -   10%)
            OrNotHighLow      824.56      (3.5%)      821.47      (4.0%)   
-0.4% (  -7% -    7%)
                HighTerm       73.82      (1.3%)       73.57      (1.1%)   
-0.3% (  -2% -    2%)
               LowPhrase       74.18      (1.9%)       73.96      (1.9%)   
-0.3% (  -4% -    3%)
            HighSpanNear       43.58      (3.4%)       43.49      (3.7%)   
-0.2% (  -7% -    7%)
                 Prefix3       72.06      (3.9%)       71.91      (3.8%)   
-0.2% (  -7% -    7%)
                PKLookup      265.53      (3.1%)      265.02      (2.8%)   
-0.2% (  -5% -    5%)
              HighPhrase        4.24      (4.2%)        4.23      (4.4%)   
-0.1% (  -8% -    8%)
           OrHighNotHigh       35.52      (1.5%)       35.51      (1.6%)   
-0.0% (  -3% -    3%)
        HighSloppyPhrase       27.77      (2.4%)       27.77      (2.8%)   
-0.0% (  -5% -    5%)
             LowSpanNear       24.53      (5.1%)       24.53      (5.7%)    
0.0% ( -10% -   11%)
         MedSloppyPhrase       51.82      (2.5%)       51.83      (2.6%)    
0.0% (  -5% -    5%)
           OrNotHighHigh       36.18      (1.0%)       36.20      (1.2%)    
0.1% (  -2% -    2%)
         LowSloppyPhrase       96.11      (2.6%)       96.18      (2.8%)    
0.1% (  -5% -    5%)
               MedPhrase      134.06      (2.0%)      134.18      (2.5%)    
0.1% (  -4% -    4%)
                  Fuzzy1       64.22      (8.2%)       64.29      (6.3%)    
0.1% ( -13% -   15%)
              AndHighMed      206.17      (1.8%)      206.47      (2.5%)    
0.1% (  -4% -    4%)
                Wildcard       27.28      (2.3%)       27.32      (2.9%)    
0.2% (  -4% -    5%)
             MedSpanNear       36.58      (3.6%)       36.64      (4.1%)    
0.2% (  -7% -    8%)
              AndHighLow      882.47      (3.8%)      884.53      (4.4%)    
0.2% (  -7% -    8%)
                 MedTerm      297.22      (1.1%)      297.91      (1.4%)    
0.2% (  -2% -    2%)
            OrHighNotLow       80.63      (2.3%)       80.85      (2.5%)    
0.3% (  -4% -    5%)
            OrHighNotMed       97.77      (2.3%)       98.11      (2.2%)    
0.3% (  -4% -    4%)
            OrNotHighMed      189.36      (1.8%)      190.11      (1.8%)    
0.4% (  -3% -    4%)
                 LowTerm      820.55      (2.9%)      830.32      (2.5%)    
1.2% (  -4% -    6%)
              OrHighHigh       26.44      (4.5%)       27.58      (3.5%)    
4.3% (  -3% -   12%)
               OrHighMed       59.16      (4.4%)       62.87      (4.2%)    
6.3% (  -2% -   15%)
               OrHighLow        8.45      (4.5%)        9.10      (4.4%)    
7.7% (  -1% -   17%)
{noformat}

I also wanted to test the overhead of propagating approximations to other 
scorers such as conjunctions, so I modified the tasks from LUCENE-6198 to make 
them look like {{+("phrase" term1) +term2}} (see attached file), here are the 
results, I think they are encouraging.

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
    AndMedPhraseHighTerm       17.10      (2.3%)       15.47      (1.7%)   
-9.5% ( -13% -   -5%)
   AndHighPhraseHighTerm        9.04      (2.0%)        8.95      (1.2%)   
-1.0% (  -4% -    2%)
     AndMedPhraseLowTerm      129.01      (5.2%)      147.93      (9.2%)   
14.7% (   0% -   30%)
    AndHighPhraseMedTerm       13.55      (2.4%)       15.90      (2.4%)   
17.3% (  12% -   22%)
    AndHighPhraseLowTerm       31.49      (2.7%)       38.07      (3.8%)   
20.9% (  13% -   28%)
     AndMedPhraseMedTerm       25.39      (2.6%)       37.93      (4.1%)   
49.4% (  41% -   57%)
{noformat}

I also added more evil tests to TestApproximationSearchEquivalence.

> Approximations on disjunctions
> ------------------------------
>
>                 Key: LUCENE-6244
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6244
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6244.patch, LUCENE-6244.patch, wikibig.tasks
>
>
> Like we just did on exact phrases and conjunctions, we should also support 
> approximations on disjunctions in order to apply "matches()" lazily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6244) Approximations on disjunctions

Reply via email to