[ https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679210#comment-13679210 ]
Robert Muir commented on LUCENE-5049: ------------------------------------- This is an apples vs oranges comparison. If you write one huge hairy java method with hardcoded query (OR) + hardcoded Postingsformat (Lucene42) + hardcoded Directory (Mmap) + Hardcoded Similarity (Default) that only works if all terms are against a single field, it would be much faster there too... > Native (C++) implementation of "pure OR" BooleanQuery > ----------------------------------------------------- > > Key: LUCENE-5049 > URL: https://issues.apache.org/jira/browse/LUCENE-5049 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-5049.patch > > > I've been playing with a C++ implementation of BooleanQuery containing > only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score. > The results are impressive: ~3X speedup for BQ OR over two terms, and > also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite > to BQ OR over N terms: > {noformat} > Task QPS base StdDev QPS comp StdDev > Pct diff > MedTerm 69.47 (15.8%) 68.61 (13.4%) > -1.2% ( -26% - 33%) > HighTerm 55.25 (16.2%) 54.63 (13.9%) > -1.1% ( -26% - 34%) > LowTerm 333.10 (9.6%) 329.43 (8.0%) > -1.1% ( -17% - 18%) > IntNRQ 3.37 (2.6%) 3.36 (4.6%) > -0.2% ( -7% - 7%) > Prefix3 18.91 (2.0%) 19.04 (3.5%) > 0.7% ( -4% - 6%) > Wildcard 29.40 (1.7%) 29.70 (2.8%) > 1.0% ( -3% - 5%) > MedPhrase 132.69 (6.2%) 134.66 (7.0%) > 1.5% ( -11% - 15%) > HighSloppyPhrase 0.82 (3.6%) 0.83 (3.5%) > 1.9% ( -5% - 9%) > AndHighHigh 19.65 (0.6%) 20.02 (0.8%) > 1.9% ( 0% - 3%) > HighPhrase 11.74 (6.6%) 11.96 (7.1%) > 1.9% ( -11% - 16%) > MedSloppyPhrase 29.09 (1.2%) 29.76 (1.9%) > 2.3% ( 0% - 5%) > LowSloppyPhrase 25.71 (1.4%) 26.98 (1.7%) > 4.9% ( 1% - 8%) > Respell 173.78 (3.0%) 182.41 (3.7%) > 5.0% ( -1% - 12%) > MedSpanNear 27.67 (2.5%) 29.07 (2.4%) > 5.1% ( 0% - 10%) > HighSpanNear 2.95 (2.4%) 3.10 (2.8%) > 5.4% ( 0% - 10%) > LowSpanNear 8.29 (3.4%) 8.82 (3.3%) > 6.4% ( 0% - 13%) > AndHighMed 79.32 (1.6%) 84.44 (1.0%) > 6.5% ( 3% - 9%) > LowPhrase 23.20 (2.0%) 25.14 (1.6%) > 8.4% ( 4% - 12%) > AndHighLow 594.17 (3.4%) 660.32 (1.9%) > 11.1% ( 5% - 16%) > Fuzzy2 88.32 (6.4%) 121.44 (1.7%) > 37.5% ( 27% - 48%) > Fuzzy1 86.34 (6.0%) 153.49 (1.7%) > 77.8% ( 66% - 90%) > OrHighHigh 16.29 (2.5%) 48.29 (1.3%) > 196.5% ( 188% - 205%) > OrHighMed 28.98 (2.7%) 87.81 (0.9%) > 203.0% ( 194% - 212%) > OrHighLow 27.38 (2.6%) 84.94 (1.1%) > 210.3% ( 201% - 219%) > {noformat} > This is essentially a scaled back attempt at LUCENE-1594 in that it's > "hardwired" to "just" the "OR of TermQuery" case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org