[jira] [Updated] (LUCENE-6458) MultiTermQuery's FILTER rewrite method should support skipping whenever possible

Adrien Grand (JIRA) Wed, 29 Apr 2015 04:35:17 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6458:
---------------------------------
    Attachment: LUCENE-6458.patch

Here is a patch, it is quite similar to the old "auto" rewrite except that it 
rewrites per segment and only consumes the filtered terms enum once. Queries 
are executed as regular disjunctions when there are 50 matching terms or less.

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                 Prefix3      113.17      (1.7%)       88.55      (2.7%)  
-21.8% ( -25% -  -17%)
                Wildcard       37.43      (2.0%)       36.26      (3.2%)   
-3.1% (  -8% -    2%)
            HighSpanNear        4.30      (2.6%)        4.24      (4.0%)   
-1.6% (  -7% -    5%)
            OrHighNotLow       71.52      (1.5%)       70.51      (3.1%)   
-1.4% (  -5% -    3%)
        HighSloppyPhrase       20.60      (6.3%)       20.34      (7.6%)   
-1.3% ( -14% -   13%)
            OrHighNotMed       96.14      (2.0%)       95.11      (2.8%)   
-1.1% (  -5% -    3%)
               MedPhrase       23.49      (1.8%)       23.30      (3.5%)   
-0.8% (  -6% -    4%)
                 Respell       62.25      (8.9%)       62.01      (7.4%)   
-0.4% ( -15% -   17%)
             AndHighHigh       52.43      (0.7%)       52.27      (1.1%)   
-0.3% (  -2% -    1%)
           OrNotHighHigh       26.08      (3.5%)       26.02      (1.0%)   
-0.2% (  -4% -    4%)
           OrHighNotHigh       61.96      (2.0%)       61.85      (2.1%)   
-0.2% (  -4% -    4%)
                  IntNRQ        8.03      (3.1%)        8.02      (2.6%)   
-0.2% (  -5% -    5%)
                 LowTerm      783.62      (4.9%)      783.25      (4.5%)   
-0.0% (  -9% -    9%)
             MedSpanNear       18.77      (1.9%)       18.76      (3.6%)   
-0.0% (  -5% -    5%)
             LowSpanNear       14.49      (2.5%)       14.49      (2.6%)   
-0.0% (  -4% -    5%)
                 MedTerm      237.81      (2.1%)      237.76      (3.0%)   
-0.0% (  -4% -    5%)
                PKLookup      266.15      (2.5%)      266.38      (2.5%)    
0.1% (  -4% -    5%)
               OrHighMed       50.61      (6.0%)       50.68      (6.1%)    
0.1% ( -11% -   13%)
                  Fuzzy2       19.87      (4.4%)       19.92      (7.8%)    
0.2% ( -11% -   12%)
            OrNotHighMed       90.03      (1.1%)       90.25      (0.8%)    
0.2% (  -1% -    2%)
              HighPhrase       15.56      (2.0%)       15.61      (2.7%)    
0.3% (  -4% -    5%)
         MedSloppyPhrase      252.97      (5.2%)      253.93      (4.3%)    
0.4% (  -8% -   10%)
               LowPhrase        8.16      (1.7%)        8.21      (1.9%)    
0.6% (  -2% -    4%)
                HighTerm      115.17      (2.4%)      116.05      (2.7%)    
0.8% (  -4% -    6%)
              OrHighHigh       25.19      (5.7%)       25.45      (6.4%)    
1.0% ( -10% -   13%)
               OrHighLow       42.12      (7.5%)       42.60      (6.9%)    
1.1% ( -12% -   16%)
         LowSloppyPhrase      129.20      (1.6%)      130.68      (2.0%)    
1.2% (  -2% -    4%)
              AndHighMed      231.64      (1.3%)      235.28      (2.1%)    
1.6% (  -1% -    4%)
              AndHighLow      733.51      (3.9%)      751.08      (3.5%)    
2.4% (  -4% -   10%)
                  Fuzzy1       85.42     (17.0%)       91.04      (5.9%)    
6.6% ( -13% -   35%)
            OrNotHighLow      893.55      (2.9%)      962.35      (4.6%)    
7.7% (   0% -   15%)
{noformat}

I was hoping it would kick in for numeric range queries but unfortunately they 
often need to match hundreds of terms. I'm wondering if it would be different 
for auto-prefix.

Prefix3 and Wildcard are a bit slower because these ones get actually executed 
as regular disjunctions. I think the slowdown is fair given that it also 
requires less memory and provides true skipping support (which the benchmark 
doesn't use).

> MultiTermQuery's FILTER rewrite method should support skipping whenever 
> possible
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-6458
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6458
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6458.patch
>
>
> Today MultiTermQuery's FILTER rewrite always builds a bit set fom all 
> matching terms. This means that we need to consume the entire postings lists 
> of all matching terms. Instead we should try to execute like regular 
> disjunctions when there are few terms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6458) MultiTermQuery's FILTER rewrite method should support skipping whenever possible

Reply via email to