Re: Sorting by Score

Peter Keegan Thu, 01 Mar 2007 05:30:32 -0800

Erick,

I think you're right because you'd wouldn't know the max score before the
comparisons. I'm just thinking about a rounding algorithm that involves
comparing the raw scores to the theoretical maximum score, which I think
could be computed from the Similarity class and knowing the max boost value
used during indexing.


Peter

On 3/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote:


Peter:

About a custom ScoreComparator. The problem I couldn't get past was that I
needed to know the max score of all the docs in order to divide the raw
scores into quintiles since I was dealing with raw scores. I didn't see
how
to make that work with ScoreComparator, but I confess that I didn't look
very hard after someone on the list turned me on to
FieldSortedHitQueue....

Erick

On 2/28/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> It may well be, but as I said this is efficient enough for my needs
> so I didn't pursue it. One of my pet peeves is spending time making
> things "more efficient" when there's no need, and my index isn't
> going to grow enough larger to worry about that now <G>...
>
> Erick
>
> On 2/28/07, Peter Keegan < [EMAIL PROTECTED]> wrote:
> >
> > Erich,
> >
> > Yes, this seems to be the simplest way to implement score
> > 'bucketization',
> > but wouldn't it be more efficient to do this with a custom
> > ScoreComparator?
> > That way, you'd do the bucketizing and sorting in one 'step'
> > (compare()).
> > Maybe the savings isn't measurable, though. A comparator might also
> > allow
> > one to do a more sophisticated rounding or bucketizing since you'd be
> > getting 2 scores at a time.
> >
> > Peter
> >
> >
> > On 2/28/07, Erick Erickson <[EMAIL PROTECTED] > wrote:
> > >
> > > Empirically, when I insert the elements in the FieldSortedHitQueue
> > > they get sorted according to the Sort object. The original query
> > > that gives me a TopDocs applied
> > > no secondary sorting, only relevancy. Since I normalized
> > > all the scores into one of only 5 discrete values, and secondary
> > > sorting was applied to all docs with the same score when I inserted
> > > them in the FieldSortedHitQueue.
> > >
> > > Now popping things of the FieldSortedHitQueue is ordered the
> > > way I want.
> > >
> > > You could just operate on the FieldSortedHitQueue at this point, but
> > > I decided the rest of my code would be simpler if I stuffed them
back
> > > into the TopDocs, so there's some explanation below that you can
> > > just skip if I've cleared things up already.....
> > >
> > > *****************
> > > The step I left out is moving the documents from the
> > > FIeldSortedHitQueue back to topDocs.scoreDocs.
> > > So the steps are as follows..
> > >
> > > 1> "bucketize" the scores. That is, go through the
> > > TopDocs.scoreDocs and adjust each raw score into
> > > one of my buckets. This is made easy by the
> > > existence of topDocs.getMaxScore . TopDocs has
> > > had no sorting other than relevancy applied so far.
> > >
> > > 2> assemble the FieldSortedHitQueue by inserting
> > > each element from scoreDocs into it, with a suitable
> > > Sort object, relevance is the first field ( SortField.FIELD_SCORE).
> > >
> > > 3> pop the entries off the FieldSortedHitQueue, overwriting
> > > the elements in topDocs.scoreDocs.
> > >
> > > I left out step <3>, although I suppose you could
> > > operate directly on the FieldSortedHitQueue.
> > >
> > > NOTE: in my case, I just put everything back in the
> > > scoreDocs without attempting any efficiencies. If I
> > > needed more performance, I'd only put as many items
> > > back as I needed to display. But as I wrote yesterday,
> > > performance isn't an issue so there's no point. Although
> > > I know one place to look if we need to squeeze more QPS.
> > >
> > > How efficient this is is an open question. But it's "fast enough"
> > > and relatively simple so I stopped looking for more
> > > efficiencies....
> > >
> > > Erick
> > >
> > > On 2/28/07, Chris Hostetter <[EMAIL PROTECTED] > wrote:
> > > >
> > > >
> > > > : The first part was just to iterate through the TopDocs that's
> > > available
> > > > to
> > > > : my and normalize the scores right in the ScoreDocs. Like this...
> > > >
> > > > Won't that be done after the Lucene does the
hitcollecting/sorting?
> > ...
> > > he
> > > > wants the "bucketing" to happen as part of hte scoring so that the
> > > > secondary sort will determine the ordering within the bucket.
> > > >
> > > > (or am i missing something about your description?)
> > > >
> > > >
> > > >
> > > >
> > > > -Hoss
> > > >
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > >
> >
>
>

Re: Sorting by Score

Reply via email to