Michael McCandless created LUCENE-5049:
------------------------------------------

             Summary: Native (C++) implementation of "pure OR" BooleanQuery
                 Key: LUCENE-5049
                 URL: https://issues.apache.org/jira/browse/LUCENE-5049
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless


I've been playing with a C++ implementation of BooleanQuery containing
only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.

The results are impressive: ~3X speedup for BQ OR over two terms, and
also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
to BQ OR over N terms:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 MedTerm       69.47     (15.8%)       68.61     (13.4%)   
-1.2% ( -26% -   33%)
                HighTerm       55.25     (16.2%)       54.63     (13.9%)   
-1.1% ( -26% -   34%)
                 LowTerm      333.10      (9.6%)      329.43      (8.0%)   
-1.1% ( -17% -   18%)
                  IntNRQ        3.37      (2.6%)        3.36      (4.6%)   
-0.2% (  -7% -    7%)
                 Prefix3       18.91      (2.0%)       19.04      (3.5%)    
0.7% (  -4% -    6%)
                Wildcard       29.40      (1.7%)       29.70      (2.8%)    
1.0% (  -3% -    5%)
               MedPhrase      132.69      (6.2%)      134.66      (7.0%)    
1.5% ( -11% -   15%)
        HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    
1.9% (  -5% -    9%)
             AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    
1.9% (   0% -    3%)
              HighPhrase       11.74      (6.6%)       11.96      (7.1%)    
1.9% ( -11% -   16%)
         MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    
2.3% (   0% -    5%)
         LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    
4.9% (   1% -    8%)
                 Respell      173.78      (3.0%)      182.41      (3.7%)    
5.0% (  -1% -   12%)
             MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    
5.1% (   0% -   10%)
            HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    
5.4% (   0% -   10%)
             LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    
6.4% (   0% -   13%)
              AndHighMed       79.32      (1.6%)       84.44      (1.0%)    
6.5% (   3% -    9%)
               LowPhrase       23.20      (2.0%)       25.14      (1.6%)    
8.4% (   4% -   12%)
              AndHighLow      594.17      (3.4%)      660.32      (1.9%)   
11.1% (   5% -   16%)
                  Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   
37.5% (  27% -   48%)
                  Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   
77.8% (  66% -   90%)
              OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  
196.5% ( 188% -  205%)
               OrHighMed       28.98      (2.7%)       87.81      (0.9%)  
203.0% ( 194% -  212%)
               OrHighLow       27.38      (2.6%)       84.94      (1.1%)  
210.3% ( 201% -  219%)
{noformat}

This is essentially a scaled back attempt at LUCENE-1594 in that it's
"hardwired" to "just" the "OR of TermQuery" case.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to