Hi All, I am using Lucene as a Search Engine for my work. I am new to this, so forgive me if I am asking a cliched question!
I need to understand how the SCORE for the search TERMs is calculated for Lucene, so that indexing can be appropriately be designed to return the most relevant results, when searched. On the official FAQ page of the Lucene site, a formula is listed as score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d where: score_d : score for document d sum_t : sum for all terms t tf_q : the square root of the frequency of t in the query tf_d : the square root of the frequency of t in d idf_t : log(numDocs/docFreq_t+1) + 1.0 numDocs : number of documents in index docFreq_t : number of documents containing t norm_q : sqrt(sum_t((tf_q*idf_t)^2)) norm_d_t : square root of number of tokens in d in the same field as t boost_t : the user-specified boost for term t coord_q_d : number of terms in both query and document / number of terms in query I didnot find the formula too helpful in figuring out what exactly the score is trying to calculate. I want to know of a logic that can be used for translating this score into something that can be used for determining which Terms are more relevant for a given Search Request. One way would be to just assume that - higher the score, more relveant is the search. But is this assumption really valid? Or are there any possible caveats to this? -Rishabh _____________________________________________________________ Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year. http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
