[ https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068394#comment-13068394 ]
Simon Willnauer edited comment on LUCENE-3328 at 7/20/11 3:37 PM: ------------------------------------------------------------------ bq. here is the same thing, only as a scorer that booleanweight picks. I like the size of the patch! Thanks for moving this into the weight. I had it separate to make BW less complex but this looks good though. bq. In general i think Query.rewrite should be reserved for simplifying Queries, this is not a simpler query, just a faster scorer I disagree here, if this would be the case it should be called simplify(Query). In general its a rewrite method and should not be judged if it simplifies or not. Here are some benchmark results trunk vs. patch (10M medium wiki docs): ||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff|| |Prefix3|29.84|1.14|29.02|1.37| -10% - 5%| |IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%| |Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%| |Term |79.10| 4.32|77.43| 3.25| -11% - 7%| |OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%| |TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%| |OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% | |Respell|15.99| 0.59|15.95| 0.52|-6% - 6%| |TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% | |Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% | |TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% | |Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% | |Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% | |SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% | |PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% | |SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% | |AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% | |AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% | was (Author: simonw): bq. here is the same thing, only as a scorer that booleanweight picks. I like the size of the patch! Thanks for moving this into the weight. I had it separate to make BW less complex but this looks good though. bq. In general i think Query.rewrite should be reserved for simplifying Queries, this is not a simpler query, just a faster scorer I disagree here, if this would be the case it should be called simplify(Query). In general its a rewrite method and should not be judged if it simplifies or not. Here are some benchmark results trunk vs. patch: ||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff|| |Prefix3|29.84|1.14|29.02|1.37| -10% - 5%| |IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%| |Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%| |Term |79.10| 4.32|77.43| 3.25| -11% - 7%| |OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%| |TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%| |OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% | |Respell|15.99| 0.59|15.95| 0.52|-6% - 6%| |TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% | |Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% | |TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% | |Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% | |Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% | |SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% | |PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% | |SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% | |AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% | |AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% | > Specialize BooleanQuery if all clauses are TermQueries > ------------------------------------------------------ > > Key: LUCENE-3328 > URL: https://issues.apache.org/jira/browse/LUCENE-3328 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 3.4, 4.0 > Reporter: Simon Willnauer > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3328.patch, LUCENE-3328.patch > > > During work on LUCENE-3319 I ran into issues with BooleanQuery compared to > PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass > the position matching, essentially doing a conjunction match, > ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. > I converted a ConjunctionScorer to use DocsEnum directly but still didn't get > all the 40% from PhraseQuery. Yet, it turned out with further optimizations > this gets very close to PhraseQuery. The biggest gain here came from > converting the hand crafted loop in ConjunctionScorer#doNext to a for loop > which seems to be less confusing to hotspot. In this particular case I think > code specialization makes lots of sense since BQ with TQ is by far one of the > most common queries. > I will upload a patch shortly -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org