Re: About Hit Scoring

Christoph Goller Sun, 31 Oct 2004 08:53:51 -0800

Chuck Williams schrieb:

That's an interesting point that helps to better analyze the situation. It seems to me the units are arbitrary and so the distance in this case is not very meaningful. I don't believe Lucene actually uses the document vector -- it uses the orthogonal projection of the document vector into the hyperspace of query terms, since it only considers document vector terms corresponding to query vector terms.


For the distance of a document vector to the query-hyperplane, the
other directions of the document vector are irrelevant.

The distance
from the tip of the projected document vector to the hyperplane
orthogonal to the query vector (within the query hyperspace) does not
seem that meaningful, even if the units were clear and natural.
Document vectors at different angles and arbitrarily large distances
from the query vector can have the same length to this plane.


The term frequency is normalized by the field length and furthermore
there is still idf that comes in. So the units do at least have some
meaning.

> From a practical standpoint, I still think it is important to have
> meaningful normalized final scores so that applications can interpret
> these scores, for example to present results to users in a manner that
> depends on the relevance of the individual results.  This seems easy to
> do in a natural way along the lines of my last proposal (boost-weighted
> normalization, possibly including some other factors).

I still agree that it would be great to have scores that could be compared
between different queries.

Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: About Hit Scoring

Reply via email to