[jira] [Commented] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

Simon Willnauer (JIRA) Wed, 20 Jul 2011 23:09:42 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068821#comment-13068821
 ]


Simon Willnauer commented on LUCENE-3328:
-----------------------------------------

bq. I'm wondering if you considered having ConjunctionTermScorer use the terms' 
IDF values to decide which iterator to advance when all are on the same docID? 
It should always be best to pick the rarest term.

The ConjunctionTermScorer sorts the DocsEnums by their frequency in the ctor. 
The leader will always be the lowest frequent term in the set. is this what you 
mean here?

We could even optimize the doNext loop and advance the lead to the last 
document we stepped out of the inner loop since this is guaranteed to be 
greater than the document the lead enum is on. I just wonder if we at some 
point step into the slowness of DocsEnum#advance(). It very important to make 
#advance(doc+1) as fast as #nextDoc() in order to keep our algs clean! 

> Specialize BooleanQuery if all clauses are TermQueries
> ------------------------------------------------------
>
>                 Key: LUCENE-3328
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3328
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3328.patch, LUCENE-3328.patch, LUCENE-3328.patch
>
>
> During work on LUCENE-3319 I ran into issues with BooleanQuery compared to 
> PhraseQuery in the exact case. If I disable scoring on PhraseQuery and bypass 
> the position matching, essentially doing a conjunction match, 
> ExactPhraseScorer beats plain boolean scorer by 40% which is a sizeable gain. 
> I converted a ConjunctionScorer to use DocsEnum directly but still didn't get 
> all the 40% from PhraseQuery. Yet, it turned out with further optimizations 
> this gets very close to PhraseQuery. The biggest gain here came from 
> converting the hand crafted loop in ConjunctionScorer#doNext to a for loop 
> which seems to be less confusing to hotspot. In this particular case I think 
> code specialization makes lots of sense since BQ with TQ is by far one of the 
> most common queries.
> I will upload a patch shortly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3328) Specialize BooleanQuery if all clauses are TermQueries

Reply via email to