Erick, I think you're right because you'd wouldn't know the max score before the comparisons. I'm just thinking about a rounding algorithm that involves comparing the raw scores to the theoretical maximum score, which I think could be computed from the Similarity class and knowing the max boost value used during indexing.
Peter On 3/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Peter: About a custom ScoreComparator. The problem I couldn't get past was that I needed to know the max score of all the docs in order to divide the raw scores into quintiles since I was dealing with raw scores. I didn't see how to make that work with ScoreComparator, but I confess that I didn't look very hard after someone on the list turned me on to FieldSortedHitQueue.... Erick On 2/28/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > It may well be, but as I said this is efficient enough for my needs > so I didn't pursue it. One of my pet peeves is spending time making > things "more efficient" when there's no need, and my index isn't > going to grow enough larger to worry about that now <G>... > > Erick > > On 2/28/07, Peter Keegan < [EMAIL PROTECTED]> wrote: > > > > Erich, > > > > Yes, this seems to be the simplest way to implement score > > 'bucketization', > > but wouldn't it be more efficient to do this with a custom > > ScoreComparator? > > That way, you'd do the bucketizing and sorting in one 'step' > > (compare()). > > Maybe the savings isn't measurable, though. A comparator might also > > allow > > one to do a more sophisticated rounding or bucketizing since you'd be > > getting 2 scores at a time. > > > > Peter > > > > > > On 2/28/07, Erick Erickson <[EMAIL PROTECTED] > wrote: > > > > > > Empirically, when I insert the elements in the FieldSortedHitQueue > > > they get sorted according to the Sort object. The original query > > > that gives me a TopDocs applied > > > no secondary sorting, only relevancy. Since I normalized > > > all the scores into one of only 5 discrete values, and secondary > > > sorting was applied to all docs with the same score when I inserted > > > them in the FieldSortedHitQueue. > > > > > > Now popping things of the FieldSortedHitQueue is ordered the > > > way I want. > > > > > > You could just operate on the FieldSortedHitQueue at this point, but > > > I decided the rest of my code would be simpler if I stuffed them back > > > into the TopDocs, so there's some explanation below that you can > > > just skip if I've cleared things up already..... > > > > > > ***************** > > > The step I left out is moving the documents from the > > > FIeldSortedHitQueue back to topDocs.scoreDocs. > > > So the steps are as follows.. > > > > > > 1> "bucketize" the scores. That is, go through the > > > TopDocs.scoreDocs and adjust each raw score into > > > one of my buckets. This is made easy by the > > > existence of topDocs.getMaxScore . TopDocs has > > > had no sorting other than relevancy applied so far. > > > > > > 2> assemble the FieldSortedHitQueue by inserting > > > each element from scoreDocs into it, with a suitable > > > Sort object, relevance is the first field ( SortField.FIELD_SCORE). > > > > > > 3> pop the entries off the FieldSortedHitQueue, overwriting > > > the elements in topDocs.scoreDocs. > > > > > > I left out step <3>, although I suppose you could > > > operate directly on the FieldSortedHitQueue. > > > > > > NOTE: in my case, I just put everything back in the > > > scoreDocs without attempting any efficiencies. If I > > > needed more performance, I'd only put as many items > > > back as I needed to display. But as I wrote yesterday, > > > performance isn't an issue so there's no point. Although > > > I know one place to look if we need to squeeze more QPS. > > > > > > How efficient this is is an open question. But it's "fast enough" > > > and relatively simple so I stopped looking for more > > > efficiencies.... > > > > > > Erick > > > > > > On 2/28/07, Chris Hostetter <[EMAIL PROTECTED] > wrote: > > > > > > > > > > > > : The first part was just to iterate through the TopDocs that's > > > available > > > > to > > > > : my and normalize the scores right in the ScoreDocs. Like this... > > > > > > > > Won't that be done after the Lucene does the hitcollecting/sorting? > > ... > > > he > > > > wants the "bucketing" to happen as part of hte scoring so that the > > > > secondary sort will determine the ordering within the bucket. > > > > > > > > (or am i missing something about your description?) > > > > > > > > > > > > > > > > > > > > -Hoss > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > >