If I understand your problem space, you're trying to compare scores across two different queries. Don't do this, it is meaningless. Scores are only valid for comparing documents' relevance within a single query. How would one compare scores between two queries like text:nonsense name:terrible ? Different fields, different values. What does it mean that the score for a doc matching the first query has a higher or lower score than the first doc matching the second query?
As for how scores are > 1, have you looked at the scoring formula here? http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html and here? http://lucene.apache.org/java/2_4_0/scoring.html#Understanding the Scoring Formula Don't forget that there can also be boosts... Best Erick On Wed, Dec 1, 2010 at 7:56 AM, Hassan Saneifar <hassan.sanei...@lirmm.fr>wrote: > Hello, > I'm using lucene to retrieve relevant segments of a corpus based on a > given query. Every segment is represented as Document in the indexing. > Once the relevant segments are retrieved, I search for a Regex in them > to capture the requested information. It can happens that I found the > regex in two or more documents retrieved by lucene. Thus, I would like > to show a confidence score to each captured Regex. To make simple, I > means if the regex is found in the first retrieved document and in > fourth, the one retrieved in the first document is more certain to be > relevant since the its document was better scored than other one. > > To show a confidence score, I tried to use scores returned by Lucene. > But, the lucene score are arbitrary and not normalized. So it's not > relevant to use. I wonder how we can have an arbitrary score (>1) when > the default scoring system in lucene is based on cosine measure. In a > simple case, the cosine score between two document vectors (obtained by > tf-idf) is between 0 and 1. > > Since the lucene score is arbitrary and not normalized, if I want to > verify which query is more relevant to retrieve a document, how can I > compare the score of the first retrieved documents corresponding to each > query ? Is there any way to give a confidence score to each retrieved > document which make the comparison of the results of the different > search using the different queries possible? > > Many thank > Best regards. > > -- > Hassan Saneifar, PhD. student, > University Montpellier II > LIRMM laboratory > 161 rue Ada 34392 Montpellier, France > Tel: +33 (0)467 41 85 41 > Fax: +33 (0)467 41 85 00 > URL: http://www.lirmm.fr/~saneifar > > -- > Hassan Saneifar, R&D engineer > Satin IP Technologies > Tel: +33 (0)467 13 00 86 > URL: http://www.satin-ip.com > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >