Re: Lucene's Scoring & Regular TF-IDF

Grant Ingersoll Mon, 10 Mar 2008 15:39:40 -0700

Have a look at DefaultSimilarity.java. I think the math works onceyou follow the formula: See http://lucene.apache.org/java/2_3_1/scoring.html


HTH, Grant


On Mar 10, 2008, at 6:05 PM, João Rodrigues wrote:

Hello all!
I've asked here a few days ago if I could get a "raw" tf-idf scoreout oflucene's methods. I was kindly advised to hack my way through the"explain"method. I have, but I can't make any sense of the information whichthere isstated. Here's a print from a search.explain. My comments & doubtsare along
in bold:


Lucene Score: 1.000000
Explanation:

1.9983159 = (MATCH) weight(contents:chaperone in 73615), product of:
 0.99999994 = queryWeight(contents:chaperone), product of:
7.3838615 = idf(docFreq=137, numDocs=81725) *-> I calculated thisas
2.7756 or 6.3911 (if using Log or Ln)*


From DefaultSimilarity.java:

public float idf(int docFreq, int numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

ln(81725/(137+1)) + 1 = 6.38 + 1 = 7.38

   0.13543049 = queryNorm
1.998316 = (MATCH) fieldWeight(contents:chaperone in 73615),product of:1.7320508 = tf(termFreq(contents:chaperone)=3) *-> The doc has 32tokens
(according to luke) and 3/32 != 1.7320508*


3^0.5 = 1.73...

   7.3838615 = idf(docFreq=137, numDocs=81725)
   0.15625 = fieldNorm(field=contents, doc=73615)

---------------------------------------------------------------------------
So, what am I missing? I read the regular tf-idf rule fromwikipedia, alongwith some other text books I found, so I'm pretty sure it is ok. Ididn'tset any boost factor or anything (otherwise it would also appearhere Isuppose). I am using the Standard Analyzer, thus accounting for ahigher tf,
but not that enormity.


--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Lucene's Scoring & Regular TF-IDF

Reply via email to