[jira] Commented: (LUCENE-2140) TopTermsScoringBooleanQueryRewrite minscore

Michael McCandless (JIRA) Thu, 10 Dec 2009 02:36:49 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788656#action_12788656
 ]


Michael McCandless commented on LUCENE-2140:
--------------------------------------------

This same sort of optimization would be interesting to explore for Lucene's 
sorting, btw.

EG, say I'm sorting by an int field, keeping top 10 resuls, and, I'm collecting 
alot of hits.  I see at some point that the "bottom" of the queue has int value 
7.  At this point, because the int[] from FieldCache is RAM resident, it'd 
likely be faster, possibly much faster for complex queries, to jump into the 
field cache, skip forward until you find a doc that has < 7 as its value, and 
ask the Scorer to advance to that doc.

Ie, there comes a time in the search where the int value of that field is a 
more performant way to drive the scoring.

I wonder if sorting by relevance could do something similar... eg if we know at 
some point the worst (bottom) relevance in our queue is 2.0, can any Scorer out 
there somehow use that info to efficiently skip forward.  Maybe only 
TermScorer, when norms aren't in use, though...

> TopTermsScoringBooleanQueryRewrite minscore
> -------------------------------------------
>
>                 Key: LUCENE-2140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2140
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: Flex Branch
>
>
> when using the TopTermsScoringBooleanQueryRewrite (LUCENE-2123), it would be 
> nice if MultiTermQuery could set an attribute specifying the minimum required 
> score once the Priority Queue is filled. 
> This way, FilteredTermsEnums could adjust their behavior accordingly based on 
> the minimal score needed to actually be a useful term (i.e. not just pass 
> thru the pq)
> An example is FuzzyTermsEnum: at some point the bottom of the priority queue 
> contains words with edit distance of 1 and enumerating any further terms is 
> simply a waste of time.
> This is because terms are compared by score, then termtext. So in this case 
> FuzzyTermsEnum could simply seek to the exact match, then end.
> This behavior could be also generalized for all n, for a different impl of 
> fuzzyquery where it is only looking in the term dictionary for words within 
> edit distance of n' which is the lowest scoring term in the pq (they adjust 
> their behavior during enumeration of the terms depending upon this attribute).
> Other FilteredTermsEnums could make use of this minimal score in their own 
> way, to drive the most efficient behavior so that they do not waste time 
> enumerating useless terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2140) TopTermsScoringBooleanQueryRewrite minscore

Reply via email to