Interpreting the score asociated with the Term? |

Rishabh Bajpai Thu, 23 Jan 2003 02:05:31 -0800

Hi All,

I am using Lucene as a Search Engine for my work. I am new to this, so forgive me if I 
am asking a cliched question!


I need to understand how the SCORE for the search TERMs is calculated for Lucene, so 
that indexing can be appropriately be designed to return the most relevant results, 
when searched. 

On the official FAQ page of the Lucene site, a formula is listed as 
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d
where:
  score_d   : score for document d
  sum_t     : sum for all terms t
  tf_q      : the square root of the frequency of t in the query
  tf_d      : the square root of the frequency of t in d
  idf_t     : log(numDocs/docFreq_t+1) + 1.0
  numDocs   : number of documents in index
  docFreq_t : number of documents containing t
  norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
  norm_d_t  : square root of number of tokens in d in the same field as t
  boost_t   : the user-specified boost for term t
  coord_q_d : number of terms in both query and document / number of terms in query

I didnot find the formula too helpful in figuring out what exactly the score is trying 
to calculate. 

I want to know of a logic that can be used for translating this score into something 
that can be used for determining which Terms are more relevant for a given Search 
Request. 

One way would be to just assume that - higher the score, more relveant is the search. 
But is this assumption really valid? Or are there any possible caveats to this?

-Rishabh



_____________________________________________________________
Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Interpreting the score asociated with the Term? |

Reply via email to