[ 
https://issues.apache.org/jira/browse/LUCENE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6458:
---------------------------------
    Attachment: LUCENE-6458.patch
                wikimedium.10M.nostopwords.tasks

I did some more benchmarking of the change with filters (see attached tasks 
file) and various thresholds (and a fixed seed):

{noformat}
16
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                     MTQ       24.33      (7.5%)       20.67      (7.3%)  
-15.1% ( -27% -    0%)
                  IntNRQ       20.38      (7.3%)       17.85     (11.9%)  
-12.4% ( -29% -    7%)
               IntNRQ_50        8.94     (10.1%)        8.67      (8.6%)   
-3.0% ( -19% -   17%)
                  MTQ_50        9.05      (7.9%)        8.93      (5.3%)   
-1.3% ( -13% -   12%)
               IntNRQ_10       13.72     (12.7%)       13.60     (11.9%)   
-0.9% ( -22% -   27%)
                IntNRQ_1       17.53     (17.1%)       17.53     (16.3%)    
0.0% ( -28% -   40%)
                  MTQ_10       13.70     (11.2%)       13.89      (8.7%)    
1.4% ( -16% -   23%)
                   MTQ_1       19.11     (15.8%)       21.43     (18.0%)   
12.1% ( -18% -   54%)

64
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  IntNRQ       20.53      (6.9%)       16.42      (5.3%)  
-20.0% ( -30% -   -8%)
                     MTQ       24.31      (7.3%)       20.34      (6.4%)  
-16.3% ( -27% -   -2%)
               IntNRQ_50        8.87      (9.2%)        8.31      (6.5%)   
-6.3% ( -20% -   10%)
               IntNRQ_10       13.55     (12.7%)       12.80     (10.2%)   
-5.6% ( -25% -   19%)
                IntNRQ_1       17.27     (16.3%)       16.38     (13.1%)   
-5.2% ( -29% -   28%)
                  MTQ_50        9.00      (7.6%)        9.02      (4.5%)    
0.3% ( -10% -   13%)
                  MTQ_10       13.65     (11.1%)       14.73      (8.2%)    
7.9% ( -10% -   30%)
                   MTQ_1       18.95     (15.1%)       25.32     (17.2%)   
33.6% (   1% -   77%)

256
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  IntNRQ       20.43      (9.4%)       12.69      (1.7%)  
-37.9% ( -44% -  -29%)
                     MTQ       24.13      (9.3%)       19.32      (5.3%)  
-19.9% ( -31% -   -5%)
                IntNRQ_1       17.21     (19.5%)       13.90      (7.7%)  
-19.2% ( -38% -    9%)
               IntNRQ_10       13.49     (12.7%)       10.95      (5.7%)  
-18.8% ( -33% -    0%)
               IntNRQ_50        8.85     (10.5%)        7.40      (3.8%)  
-16.4% ( -27% -   -2%)
                  MTQ_50        8.94      (8.3%)        8.82      (4.4%)   
-1.3% ( -12% -   12%)
                  MTQ_10       13.53     (12.6%)       14.64      (5.9%)    
8.2% (  -9% -   30%)
                   MTQ_1       18.88     (15.6%)       26.52     (14.2%)   
40.5% (   9% -   83%)

1024
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  IntNRQ       20.40      (7.7%)        6.54      (1.5%)  
-67.9% ( -71% -  -63%)
                IntNRQ_1       17.57     (17.2%)        8.27      (2.9%)  
-52.9% ( -62% -  -39%)
               IntNRQ_10       13.66     (13.0%)        6.72      (2.4%)  
-50.8% ( -58% -  -40%)
               IntNRQ_50        8.96     (10.4%)        5.01      (1.5%)  
-44.1% ( -50% -  -35%)
                     MTQ       24.41      (8.2%)       18.07      (4.4%)  
-26.0% ( -35% -  -14%)
                  MTQ_50        9.05      (8.1%)        8.65      (3.5%)   
-4.5% ( -14% -    7%)
                  MTQ_10       13.60     (11.5%)       14.41      (3.9%)    
6.0% (  -8% -   24%)
                   MTQ_1       19.11     (15.6%)       27.32     (12.9%)   
43.0% (  12% -   84%)
{noformat}

Rewriting to a BooleanQuery never helps when there is no filter, but something 
that the benchmark doesn't capture is that at least BooleanQuery does not 
allocate O(maxDoc) memory which can matter for large datasets.

When there are filters, it's more complicated, it depends on the density of the 
filter, on the number of terms and also apparently on how frequencies of the 
different terms compare (this is my current theory for why WildcardQuery 
performs better than NRQ).

Net/net I think this validates that 64 would be a good threshold to rewrite, 
with a minimum slowdown when filters are dense, and interesting speedups when 
filters are sparse?

> MultiTermQuery's FILTER rewrite method should support skipping whenever 
> possible
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-6458
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6458
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6458.patch, LUCENE-6458.patch, 
> wikimedium.10M.nostopwords.tasks
>
>
> Today MultiTermQuery's FILTER rewrite always builds a bit set fom all 
> matching terms. This means that we need to consume the entire postings lists 
> of all matching terms. Instead we should try to execute like regular 
> disjunctions when there are few terms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to