[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

Simon Willnauer (JIRA) Wed, 20 Jul 2011 08:39:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068394#comment-13068394
 ]


Simon Willnauer edited comment on LUCENE-3328 at 7/20/11 3:37 PM:
------------------------------------------------------------------

bq. here is the same thing, only as a scorer that booleanweight picks.

I like the size of the patch! Thanks for moving this into the weight. I had it 
separate to make BW less complex but this looks good though.


bq. In general i think Query.rewrite should be reserved for simplifying 
Queries, this is not a simpler query, just a faster scorer 

I disagree here, if this would be the case it should be called simplify(Query). 
In general its a rewrite method and should not be judged if it simplifies or 
not. 



Here are some benchmark results trunk vs. patch (10M medium wiki docs):


||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 |  0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% -  7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M  |  10.82| 0.76|10.77| 0.69 |-13% -   13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% -  6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% -  8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |

      was (Author: simonw):
    bq. here is the same thing, only as a scorer that booleanweight picks.

I like the size of the patch! Thanks for moving this into the weight. I had it 
separate to make BW less complex but this looks good though.


bq. In general i think Query.rewrite should be reserved for simplifying 
Queries, this is not a simpler query, just a faster scorer 

I disagree here, if this would be the case it should be called simplify(Query). 
In general its a rewrite method and should not be judged if it simplifies or 
not. 



Here are some benchmark results trunk vs. patch:


||Task||QPS Trunk||StdDev||QPS Patch|| StdDev||Pct diff||
|Prefix3|29.84|1.14|29.02|1.37| -10% - 5%|
|IntNRQ| 5.82|0.67|5.68 |  0.55|-20% - 20%|
|Wildcard|15.96| 0.77|15.62| 0.63| -10% - 7%|
|Term |79.10| 4.32|77.43| 3.25| -11% -  7%|
|OrHighMed | 15.67| 0.94|15.44| 1.00|-13% - 11%|
|TermGroup1M  |  10.82| 0.76|10.77| 0.69 |-13% -   13%|
|OrHighHigh| 3.31| 0.37| 3.29| 0.37 | -20% - 24% |
|Respell|15.99| 0.59|15.95| 0.52|-6% -  6%|
|TermBGroup1M|12.87| 1.09|12.86| 0.94|-14% - 17% |
|Fuzzy1|24.38| 1.19|24.39| 0.84|-7% -  8% |
|TermBGroup1M1P|17.67| 1.33|17.79| 1.14|-12% - 15% |
|Fuzzy2| 7.60| 0.64| 7.67| 0.59|-14% - 18% |
|Phrase| 6.84| 0.64| 6.91| 0.62|-15% - 21% |
|SpanNear| 1.90| 0.24| 1.92| 0.22|-20% - 29% |
|PKLookup|76.01| 4.56|76.99| 3.26|-8% - 12% |
|SloppyPhrase| 2.49| 0.25| 2.53| 0.23|-16% - 23% |
|AndHighMed|29.80| 1.11|33.50| 1.31 | 4% - 21% |
|AndHighHigh|10.74| 0.67|12.26| 0.55 | 2% - 27% |
  
> Specialize BooleanQuery if all clauses are TermQueries
> ------------------------------------------------------
>
>                 Key: LUCENE-3328
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3328
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3328.patch, LUCENE-3328.patch
>
>
> During work on LUCENE-3319 I ran into issues with BooleanQuery compared to 
> PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass 
> the position matching, essentially doing a conjunction match, 
> ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. 
> I converted a ConjunctionScorer to use DocsEnum directly but still didn't get 
> all the 40% from PhraseQuery. Yet, it turned out with further optimizations 
> this gets very close to PhraseQuery. The biggest gain here came from 
> converting the hand crafted loop in ConjunctionScorer#doNext to a for loop 
> which seems to be less confusing to hotspot. In this particular case I think 
> code specialization makes lots of sense since BQ with TQ is by far one of the 
> most common queries.
> I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

Reply via email to