No, they should not be compared. Scores are only relative to each
other for that given input query, despite what the queryNorm docs
say. The queryNorm was an attempt at doing it, but my understanding
of the research still indicates they are not comparable.
-Grant
On Nov 18, 2008, at 12:02 PM, Ng Vinny wrote:
Hi all,
I am wondering if the raw scores obtained from HitCollector can be
used to
compare relevance of documents to different queries?
E.g. two phrase queries are issued : (PQ1: "Barack Obama" and
PQ2: "John
McCain"). if a document (doc1) belongs to the result sets of both
queries
and has the raw score of 5 for PQ1 and 3 for PQ2, can I say that
doc1 is
more relevant to "Barack Obama" than to "John McCain"?
There have been some previous discussions about this at [1,2]. On
the other
hand, the javadoc of the Similarity class says "*queryNorm(q) * is a
normalizing factor used to make scores between queries comparable.
This
factor does not affect document ranking (since all ranked documents
are
multiplied by the same factor), but rather just attempts to make
scores from
different queries (or even different indexes) comparable. "
Please advise.
Thanks.
Ng.
[1] http://thread.gmane.org/gmane.comp.jakarta.lucene.user/10760/focus=10810
[2]
http://www.gossamer-threads.com/lists/lucene/java-user/35051?search_string=compare%20score%20across%20queries;#35051
[3]
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]