[ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920940#action_12920940 ]
Robert Muir commented on LUCENE-2690: ------------------------------------- I will play with the latest patch some, and hopefully upload a new one. The real solution to this "tie-break" case really is the fact that the priority queue comparison is "compare by boost, then term text". With the MultiTermsEnum this was no problem, because we look at all terms in order, so we made MaxNonCompetitiveBoostAttribut just a float. With per-segment rewrite, then we can look at terms out-of-order. So I think if we add the optional term text of the pq's bottom for the previous segment to the MaxNonCompetitiveBoostAttribute itself, then the enum itself can implement the tie break, cleaner, and more efficiently. The rewrite method should or consumer should only be setting the values of this attribute and not dealing with this case. > Do MultiTermQuery boolean rewrites per segment > ---------------------------------------------- > > Key: LUCENE-2690 > URL: https://issues.apache.org/jira/browse/LUCENE-2690 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 4.0 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 4.0 > > Attachments: LUCENE-2690-attributes.patch, > LUCENE-2690-attributes.patch, LUCENE-2690-hack.patch, LUCENE-2690.patch, > LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, > LUCENE-2690.patch > > > MultiTermQuery currently rewrites FuzzyQuery (using > TopTermsBooleanQueryRewrite), the auto constant rewrite method and the > ScoringBQ rewrite methods using a MultiFields wrapper on the top-level > reader. This is inefficient. > This patch changes the rewrite modes to do the rewrites per segment and uses > some additional datastructures (hashed sets/maps) to exclude duplicate terms. > All tests currently pass, but FuzzyQuery's tests should not, because it > depends for the minimum score handling, that the terms are collected in > order.. > Robert will fix FuzzyQuery in this issue, too. This patch is just a start. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org