[
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068394#comment-13068394
]
Simon Willnauer edited comment on LUCENE-3328 at 7/20/11 3:37 PM:
------------------------------------------------------------------
bq. here is the same thing, only as a scorer that booleanweight picks.
I like the size of the patch! Thanks for moving this into the weight. I had it
separate to make BW less complex but this looks good though.
bq. In general i think Query.rewrite should be reserved for simplifying
Queries, this is not a simpler query, just a faster scorer
I disagree here, if this would be the case it should be called simplify(Query).
In general its a rewrite method and should not be judged if it simplifies or
not.
Here are some benchmark results trunk vs. patch (10M medium wiki docs):
||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% - 7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% - 6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |
was (Author: simonw):
bq. here is the same thing, only as a scorer that booleanweight picks.
I like the size of the patch! Thanks for moving this into the weight. I had it
separate to make BW less complex but this looks good though.
bq. In general i think Query.rewrite should be reserved for simplifying
Queries, this is not a simpler query, just a faster scorer
I disagree here, if this would be the case it should be called simplify(Query).
In general its a rewrite method and should not be judged if it simplifies or
not.
Here are some benchmark results trunk vs. patch:
||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 | 0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% - 7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M | 10.82| 0.76|10.77| 0.69 |-13% - 13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% - 6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% - 8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |
> Specialize BooleanQuery if all clauses are TermQueries
> ------------------------------------------------------
>
> Key: LUCENE-3328
> URL: https://issues.apache.org/jira/browse/LUCENE-3328
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 3.4, 4.0
> Reporter: Simon Willnauer
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3328.patch, LUCENE-3328.patch
>
>
> During work on LUCENE-3319 I ran into issues with BooleanQuery compared to
> PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass
> the position matching, essentially doing a conjunction match,
> ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain.
> I converted a ConjunctionScorer to use DocsEnum directly but still didn't get
> all the 40% from PhraseQuery. Yet, it turned out with further optimizations
> this gets very close to PhraseQuery. The biggest gain here came from
> converting the hand crafted loop in ConjunctionScorer#doNext to a for loop
> which seems to be less confusing to hotspot. In this particular case I think
> code specialization makes lots of sense since BQ with TQ is by far one of the
> most common queries.
> I will upload a patch shortly
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]