It may well be, but as I said this is efficient enough for my needs
so I didn't pursue it. One of my pet peeves is spending time making
things "more efficient" when there's no need, and my index isn't
going to grow enough larger to worry about that now <G>...

Erick

On 2/28/07, Peter Keegan <[EMAIL PROTECTED]> wrote:

Erich,

Yes, this seems to be the simplest way to implement score 'bucketization',
but wouldn't it be more efficient to do this with a custom
ScoreComparator?
That way, you'd do the bucketizing and sorting in one 'step' (compare()).
Maybe the savings isn't measurable, though. A comparator might also allow
one to do a more sophisticated rounding or bucketizing since you'd be
getting 2 scores at a time.

Peter


On 2/28/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Empirically, when I insert the elements in the FieldSortedHitQueue
> they get sorted according to the Sort object. The original query
> that gives me a TopDocs applied
> no secondary sorting, only relevancy. Since I normalized
> all the scores into one of only 5 discrete values, and secondary
> sorting was applied to all docs with the same score when I inserted
> them in the FieldSortedHitQueue.
>
> Now popping things of the FieldSortedHitQueue is ordered the
> way I want.
>
> You could just operate on the FieldSortedHitQueue at this point, but
> I decided the rest of my code would be simpler if I stuffed them back
> into the TopDocs, so there's some explanation below that you can
> just skip if I've cleared things up already.....
>
> *****************
> The step I left out is moving the documents from the
> FIeldSortedHitQueue back to topDocs.scoreDocs.
> So the steps are as follows..
>
> 1> "bucketize" the scores. That is, go through the
> TopDocs.scoreDocs and adjust each raw score into
> one of my buckets. This is made easy by the
> existence of topDocs.getMaxScore. TopDocs has
> had no sorting other than relevancy applied so far.
>
> 2> assemble the FieldSortedHitQueue by inserting
> each element from scoreDocs into it, with a suitable
> Sort object, relevance is the first field (SortField.FIELD_SCORE).
>
> 3> pop the entries off the FieldSortedHitQueue, overwriting
> the elements in topDocs.scoreDocs.
>
> I left out step <3>, although I suppose you could
> operate directly on the FieldSortedHitQueue.
>
> NOTE: in my case, I just put everything back in the
> scoreDocs without attempting any efficiencies. If I
> needed more performance, I'd only put as many items
> back as I needed to display. But as I wrote yesterday,
> performance isn't an issue so there's no point. Although
> I know one place to look if we need to squeeze more QPS.
>
> How efficient this is is an open question. But it's "fast enough"
> and relatively simple so I stopped looking for more
> efficiencies....
>
> Erick
>
> On 2/28/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> >
> > : The first part was just to iterate through the TopDocs that's
> available
> > to
> > : my and normalize the scores right in the ScoreDocs. Like this...
> >
> > Won't that be done after the Lucene does the hitcollecting/sorting?
...
> he
> > wants the "bucketing" to happen as part of hte scoring so that the
> > secondary sort will determine the ordering within the bucket.
> >
> > (or am i missing something about your description?)
> >
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

Reply via email to