Yes, I believe so. --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Otis, > > Didn't somebody (Doug?) also mention that a keyword in a shorter > document is > deemed more significant than in a longer one (because, I guess, it > represents a larger percentage of the document)? > > Regards, > > Terry > ----- Original Message ----- > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]>; > <[EMAIL PROTECTED]> > Sent: Thursday, January 23, 2003 10:58 AM > Subject: Re: Interpreting the score asociated with the Term? | > > > > Here is a simplified explanation of some basic stuff. > > > > 1. the more frequent the term (in a collection) the lower its > weight > > (significance). Makes sense - very popular words don't distinguish > one > > document from the other much, because they are present in so many > docs. > > > > 2. the more frequent a word in a single document, the higher the > > documents 'value' when the query contains that word. So the score > goes > > up for frequent words in a document, esp. if they are not frequent > in > > other documents in the collection. > > > > 3. there is a boost factor which allow you to boost certain terms > at > > query time (e.g. you value matches in title field more than the > body > > field? boost title field queries) > > > > 4. normalization factor, I believe, normalizes things so that > longer > > documents don't have advantage over shorter ones. > > > > There is more to this....but I am already not 100% about all of the > > above, so I'll stop here :) > > > > Also note that you can boost fields at index time (you'll have to > use > > the nightly build for that instead of the 1.2 release to get this, > I > > believe). > > > > Otis > > > > > > --- Rishabh Bajpai <[EMAIL PROTECTED]> wrote: > > > > > > Hi All, > > > > > > I am using Lucene as a Search Engine for my work. I am new to > this, > > > so forgive me if I am asking a cliched question! > > > > > > I need to understand how the SCORE for the search TERMs is > calculated > > > for Lucene, so that indexing can be appropriately be designed to > > > return the most relevant results, when searched. > > > > > > On the official FAQ page of the Lucene site, a formula is listed > as > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * > > > boost_t) * coord_q_d > > > where: > > > score_d : score for document d > > > sum_t : sum for all terms t > > > tf_q : the square root of the frequency of t in the query > > > tf_d : the square root of the frequency of t in d > > > idf_t : log(numDocs/docFreq_t+1) + 1.0 > > > numDocs : number of documents in index > > > docFreq_t : number of documents containing t > > > norm_q : sqrt(sum_t((tf_q*idf_t)^2)) > > > norm_d_t : square root of number of tokens in d in the same > field > > > as t > > > boost_t : the user-specified boost for term t > > > coord_q_d : number of terms in both query and document / number > of > > > terms in query > > > > > > I didnot find the formula too helpful in figuring out what > exactly > > > the score is trying to calculate. > > > > > > I want to know of a logic that can be used for translating this > score > > > into something that can be used for determining which Terms are > more > > > relevant for a given Search Request. > > > > > > One way would be to just assume that - higher the score, more > > > relveant is the search. But is this assumption really valid? Or > are > > > there any possible caveats to this? > > > > > > -Rishabh > > > > > > > > > > > > _____________________________________________________________ > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for > $19.95/year. > > > > http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus > > > > > > -- > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > > http://mailplus.yahoo.com > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> >
__________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
