Re: The values which compute scores.

Yonik Seeley Wed, 30 May 2007 14:07:28 -0700

On 5/30/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote:

a) Where does freq come from?  (Not what is it, but who computes it and how?)


For a single term, it's determined at index time and stored in the index.
TermDocs gives you a list of documents containing the term, and for
each document, the number of time the term appears (the freq).

Note: TermScorer calls the tf(int) version on Similarity rather than
the tf(float) version.

c) What values should I be passing out of a function like this?
Should I normalize my outgoing scores to some scale, or do I simply
just need to provide numbers that "have the right shaped curve".


Hits normalizes to 1 based on the max score, if that max score is
greater than 1.
Scores across queries aren't really comparable though.

I look at things like idf() which returns 1+log(ratio) and then has
that value squared.  Clearly that isn't on a scale of 1.0 to 0.0.

I feel like there may be some mathematical trickery going on and that
maybe the actual score values themselves don't matter inside the
ranking code, so long as their relative values to one another.


Pretty much.

This then makes me ponder how the normalization process is done
between queries, allowing for a mix'n'match of results as these
numbers spill to the outside world.  Obviously normalization has to
happen at that point for the mixing query results magic to work.


Lucene doesn't currently do this "mixing", and it's not really clear
to me how it should be done.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The values which compute scores.

Reply via email to