[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Yonik Seeley (JIRA) Thu, 14 Oct 2010 12:13:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921081#action_12921081
 ]


Yonik Seeley commented on LUCENE-2690:
--------------------------------------

bq. For random queries it had a huge positive impact on query perf. 

If the clauses were just term queries, that would make me really suspect the 
test.
If it was MTQ queries, then MTQ should sort, not BQ.

bq. The BQ cloning/reordering was not measureable.

Right - I would expect that for typical queries and typical uses.
I guess I'm worried about the atypical cases since I've seen so many of them - 
people putting together single boolean queries with 10K clauses, people doing 
complex nested queries with thousands of terms, or people executing thousands 
of queries per request (or per document added, via memory index) where this 
overhead suddenly becomes significant.

bq. We are still working on this patch, its marked as TODO, so we will 
investigate further.

Cool :-)

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, 
> LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, 
> LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to