Re: Lucene's Ranking Function

Doug Cutting Wed, 11 Sep 2002 13:48:04 -0700

Clemens Marschner wrote:
> 1. I think the new document boost is missing, isn't it?
> With that it should be something like
> 
>  score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
> * coord_q_d * boost_d
> Is that correct?


Almost.  This should actually be boost_d * boost_d_t, the boost factor 
for the document multiplied by the boost for t's field in d.

> 2. If I like the score to be independent of the number of terms in the
> document (regarding them as essentially constant), is it enough to leave out
> the norm_d_t factor?

Yes.  Note however that the quantity called 'norm' in the code is now 
frequently actually norm_d_t * boost_t * boost_d_t.  This quantity is 
now computed at index time and stored in the norms file.

> I have seen that a norm factor between 0 and 255 is read with
> IndexReader.norms() in TermScorer.score(). Is that the one?

Yes, although see my note above.

> From what I further understand (and from digging in Witten/Moffat/Bell) the
> norm_q factor is not calculated, since it stays the same for one query.

Lucene calculates it anyway.  It's cheap to compute: it is multiplied 
together with the term boost and idf once per query term, then this 
weight is used in subsequent computations.

Doug


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Lucene's Ranking Function

Reply via email to