[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Robert Muir (JIRA) Thu, 14 Oct 2010 05:29:08 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920940#action_12920940
 ]


Robert Muir commented on LUCENE-2690:
-------------------------------------

I will play with the latest patch some, and hopefully upload a new one.

The real solution to this "tie-break" case really is the fact that the priority 
queue comparison is "compare by boost, then term text".

With the MultiTermsEnum this was no problem, because we look at all terms in 
order, so we made MaxNonCompetitiveBoostAttribut just a float.

With per-segment rewrite, then we can look at terms out-of-order.

So I think if we add the optional term text of the pq's bottom for the previous 
segment to the MaxNonCompetitiveBoostAttribute itself, then the enum itself can 
implement the tie break, cleaner, and more efficiently. The rewrite method 
should or consumer should only be setting the values of this attribute and not 
dealing with this case.


> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, 
> LUCENE-2690-attributes.patch, LUCENE-2690-hack.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to