Hi all,

I am wondering if the raw scores obtained from HitCollector can be used to
compare relevance of documents to different queries?

E.g.  two  phrase queries are issued : (PQ1: "Barack Obama"  and PQ2:  "John
McCain"). if a document (doc1) belongs to the result sets of both queries
and has the raw score of 5 for PQ1 and 3 for PQ2, can  I say that doc1 is
more relevant to "Barack Obama" than to "John McCain"?

There have been some previous discussions about this at [1,2]. On the other
hand, the javadoc of the Similarity class says "*queryNorm(q) * is a
normalizing factor used to make scores between queries comparable. This
factor does not affect document ranking (since all ranked documents are
multiplied by the same factor), but rather just attempts to make scores from
different queries (or even different indexes) comparable. "

Please advise.

Thanks.
Ng.

[1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/10760/focus=10810
[2]
http://www.gossamer-threads.com/lists/lucene/java-user/35051?search_string=compare%20score%20across%20queries;#35051
[3]
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html

Reply via email to