No, they should not be compared. Scores are only relative to each other for that given input query, despite what the queryNorm docs say. The queryNorm was an attempt at doing it, but my understanding of the research still indicates they are not comparable.

-Grant

On Nov 18, 2008, at 12:02 PM, Ng Vinny wrote:

Hi all,

I am wondering if the raw scores obtained from HitCollector can be used to
compare relevance of documents to different queries?

E.g. two phrase queries are issued : (PQ1: "Barack Obama" and PQ2: "John McCain"). if a document (doc1) belongs to the result sets of both queries and has the raw score of 5 for PQ1 and 3 for PQ2, can I say that doc1 is
more relevant to "Barack Obama" than to "John McCain"?

There have been some previous discussions about this at [1,2]. On the other
hand, the javadoc of the Similarity class says "*queryNorm(q) * is a
normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from
different queries (or even different indexes) comparable. "

Please advise.

Thanks.
Ng.

[1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/10760/focus=10810
[2]
http://www.gossamer-threads.com/lists/lucene/java-user/35051?search_string=compare%20score%20across%20queries;#35051
[3]
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to