[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Robert Muir (JIRA) Sun, 09 Jun 2013 15:42:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679210#comment-13679210
 ]


Robert Muir commented on LUCENE-5049:
-------------------------------------

This is an apples vs oranges comparison.

If you write one huge hairy java method with hardcoded query (OR) + hardcoded 
Postingsformat (Lucene42) + hardcoded Directory (Mmap) + Hardcoded Similarity 
(Default) that only works if all terms are against a single field, it would be 
much faster there too... 
                
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
>                 Key: LUCENE-5049
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5049
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                  MedTerm       69.47     (15.8%)       68.61     (13.4%)   
> -1.2% ( -26% -   33%)
>                 HighTerm       55.25     (16.2%)       54.63     (13.9%)   
> -1.1% ( -26% -   34%)
>                  LowTerm      333.10      (9.6%)      329.43      (8.0%)   
> -1.1% ( -17% -   18%)
>                   IntNRQ        3.37      (2.6%)        3.36      (4.6%)   
> -0.2% (  -7% -    7%)
>                  Prefix3       18.91      (2.0%)       19.04      (3.5%)    
> 0.7% (  -4% -    6%)
>                 Wildcard       29.40      (1.7%)       29.70      (2.8%)    
> 1.0% (  -3% -    5%)
>                MedPhrase      132.69      (6.2%)      134.66      (7.0%)    
> 1.5% ( -11% -   15%)
>         HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    
> 1.9% (  -5% -    9%)
>              AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    
> 1.9% (   0% -    3%)
>               HighPhrase       11.74      (6.6%)       11.96      (7.1%)    
> 1.9% ( -11% -   16%)
>          MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    
> 2.3% (   0% -    5%)
>          LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    
> 4.9% (   1% -    8%)
>                  Respell      173.78      (3.0%)      182.41      (3.7%)    
> 5.0% (  -1% -   12%)
>              MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    
> 5.1% (   0% -   10%)
>             HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    
> 5.4% (   0% -   10%)
>              LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    
> 6.4% (   0% -   13%)
>               AndHighMed       79.32      (1.6%)       84.44      (1.0%)    
> 6.5% (   3% -    9%)
>                LowPhrase       23.20      (2.0%)       25.14      (1.6%)    
> 8.4% (   4% -   12%)
>               AndHighLow      594.17      (3.4%)      660.32      (1.9%)   
> 11.1% (   5% -   16%)
>                   Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   
> 37.5% (  27% -   48%)
>                   Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   
> 77.8% (  66% -   90%)
>               OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  
> 196.5% ( 188% -  205%)
>                OrHighMed       28.98      (2.7%)       87.81      (0.9%)  
> 203.0% ( 194% -  212%)
>                OrHighLow       27.38      (2.6%)       84.94      (1.1%)  
> 210.3% ( 201% -  219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Reply via email to