[jira] [Updated] (LUCENE-6198) two phase intersection

Adrien Grand (JIRA) Wed, 11 Feb 2015 11:11:59 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6198:
---------------------------------
    Attachment: LUCENE-6198.patch

New patch that adds two-phase support to ConjunctionScorer. luceneutil seems 
happy with the patch too:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
              HighPhrase       12.26     (11.3%)       11.89      (5.3%)   
-3.0% ( -17% -   15%)
              AndHighLow      894.95      (9.5%)      874.08      (2.9%)   
-2.3% ( -13% -   11%)
               LowPhrase       18.81      (9.2%)       18.51      (4.8%)   
-1.6% ( -14% -   13%)
                  Fuzzy1       72.76     (12.2%)       71.65      (9.6%)   
-1.5% ( -20% -   23%)
               MedPhrase       54.31     (11.0%)       53.81      (3.2%)   
-0.9% ( -13% -   14%)
                 LowTerm      806.00     (11.9%)      808.20      (4.5%)    
0.3% ( -14% -   18%)
                 Respell       55.89     (10.2%)       56.57      (4.2%)    
1.2% ( -11% -   17%)
            OrNotHighLow     1102.88     (11.4%)     1116.63      (4.3%)    
1.2% ( -13% -   19%)
             LowSpanNear        9.48      (9.5%)        9.61      (4.4%)    
1.4% ( -11% -   16%)
         LowSloppyPhrase       71.86      (8.8%)       72.89      (3.5%)    
1.4% (  -9% -   15%)
         MedSloppyPhrase       29.92     (10.3%)       30.35      (4.2%)    
1.4% ( -11% -   17%)
             MedSpanNear       79.24      (8.6%)       80.39      (3.2%)    
1.5% (  -9% -   14%)
                  IntNRQ       16.81      (9.4%)       17.06      (6.1%)    
1.5% ( -12% -   18%)
        HighSloppyPhrase       23.27     (11.6%)       23.64      (8.1%)    
1.6% ( -16% -   24%)
              OrHighHigh       16.79     (10.6%)       17.08      (7.7%)    
1.7% ( -15% -   22%)
            OrHighNotLow       84.84     (10.3%)       86.32      (3.2%)    
1.7% ( -10% -   17%)
           OrNotHighHigh       56.28      (9.4%)       57.30      (1.9%)    
1.8% (  -8% -   14%)
                HighTerm      123.91     (10.8%)      126.29      (2.8%)    
1.9% ( -10% -   17%)
                 MedTerm      243.44     (11.1%)      248.40      (2.9%)    
2.0% ( -10% -   18%)
                Wildcard       74.84      (9.9%)       76.36      (3.1%)    
2.0% (  -9% -   16%)
           OrHighNotHigh       45.48      (9.9%)       46.47      (1.9%)    
2.2% (  -8% -   15%)
               OrHighLow       79.36     (11.3%)       81.10      (6.5%)    
2.2% ( -14% -   22%)
                 Prefix3       74.29     (10.5%)       75.96      (4.9%)    
2.2% ( -11% -   19%)
            OrHighNotMed       53.37     (10.7%)       54.62      (2.5%)    
2.3% (  -9% -   17%)
                PKLookup      266.92     (10.4%)      273.30      (3.4%)    
2.4% ( -10% -   18%)
            HighSpanNear       19.64     (10.4%)       20.11      (3.0%)    
2.4% (  -9% -   17%)
            OrNotHighMed      167.57     (11.7%)      171.67      (2.4%)    
2.4% ( -10% -   18%)
               OrHighMed       72.90     (12.5%)       74.87      (6.6%)    
2.7% ( -14% -   24%)
                  Fuzzy2       50.70     (13.8%)       52.58      (8.4%)    
3.7% ( -16% -   30%)
              AndHighMed      160.13     (10.1%)      169.60      (3.4%)    
5.9% (  -6% -   21%)
             AndHighHigh       69.49      (8.8%)       74.19      (3.3%)    
6.8% (  -4% -   20%)
{noformat}

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch, LUCENE-6198.patch, LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6198) two phase intersection

Reply via email to