Maybe we'll do this in the future, but what I provided is really basic, not Lucene specific info. If somebody else describes it in detail I'll gladly put it up on the site.
Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Otis, > > I think the effort you made in your previous message (to describe the > basic > relevance measures in simple, non-algorithmic terms) is very > important. If > you think that list is reasonably comprehensive (that is, it captures > most > of what relevance means), I'd urge you to insert this into the > documentation. I think it is very valuable. > > Regards, > > Terry > > ----- Original Message ----- > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Thursday, January 23, 2003 12:02 PM > Subject: Re: Interpreting the score asociated with the Term? | > > > > Yes, I believe so. > > > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > > Otis, > > > > > > Didn't somebody (Doug?) also mention that a keyword in a shorter > > > document is > > > deemed more significant than in a longer one (because, I guess, > it > > > represents a larger percentage of the document)? > > > > > > Regards, > > > > > > Terry > > > ----- Original Message ----- > > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]>; > > > <[EMAIL PROTECTED]> > > > Sent: Thursday, January 23, 2003 10:58 AM > > > Subject: Re: Interpreting the score asociated with the Term? | > > > > > > > > > > Here is a simplified explanation of some basic stuff. > > > > > > > > 1. the more frequent the term (in a collection) the lower its > > > weight > > > > (significance). Makes sense - very popular words don't > distinguish > > > one > > > > document from the other much, because they are present in so > many > > > docs. > > > > > > > > 2. the more frequent a word in a single document, the higher > the > > > > documents 'value' when the query contains that word. So the > score > > > goes > > > > up for frequent words in a document, esp. if they are not > frequent > > > in > > > > other documents in the collection. > > > > > > > > 3. there is a boost factor which allow you to boost certain > terms > > > at > > > > query time (e.g. you value matches in title field more than the > > > body > > > > field? boost title field queries) > > > > > > > > 4. normalization factor, I believe, normalizes things so that > > > longer > > > > documents don't have advantage over shorter ones. > > > > > > > > There is more to this....but I am already not 100% about all of > the > > > > above, so I'll stop here :) > > > > > > > > Also note that you can boost fields at index time (you'll have > to > > > use > > > > the nightly build for that instead of the 1.2 release to get > this, > > > I > > > > believe). > > > > > > > > Otis > > > > > > > > > > > > --- Rishabh Bajpai <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Hi All, > > > > > > > > > > I am using Lucene as a Search Engine for my work. I am new to > > > this, > > > > > so forgive me if I am asking a cliched question! > > > > > > > > > > I need to understand how the SCORE for the search TERMs is > > > calculated > > > > > for Lucene, so that indexing can be appropriately be designed > to > > > > > return the most relevant results, when searched. > > > > > > > > > > On the official FAQ page of the Lucene site, a formula is > listed > > > as > > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / > norm_d_t * > > > > > boost_t) * coord_q_d > > > > > where: > > > > > score_d : score for document d > > > > > sum_t : sum for all terms t > > > > > tf_q : the square root of the frequency of t in the > query > > > > > tf_d : the square root of the frequency of t in d > > > > > idf_t : log(numDocs/docFreq_t+1) + 1.0 > > > > > numDocs : number of documents in index > > > > > docFreq_t : number of documents containing t > > > > > norm_q : sqrt(sum_t((tf_q*idf_t)^2)) > > > > > norm_d_t : square root of number of tokens in d in the > same > > > field > > > > > as t > > > > > boost_t : the user-specified boost for term t > > > > > coord_q_d : number of terms in both query and document / > number > > > of > > > > > terms in query > > > > > > > > > > I didnot find the formula too helpful in figuring out what > > > exactly > > > > > the score is trying to calculate. > > > > > > > > > > I want to know of a logic that can be used for translating > this > > > score > > > > > into something that can be used for determining which Terms > are > > > more > > > > > relevant for a given Search Request. > > > > > > > > > > One way would be to just assume that - higher the score, more > > > > > relveant is the search. But is this assumption really valid? > Or > > > are > > > > > there any possible caveats to this? > > > > > > > > > > -Rishabh > > > > > > > > > > > > > > > > > > > > _____________________________________________________________ > > > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for > > > $19.95/year. > > > > > > > > > http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus > > > > > > > > > > -- > > > > > To unsubscribe, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > For additional commands, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > __________________________________________________ > > > > Do you Yahoo!? > > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > > > > http://mailplus.yahoo.com > > > > > > > > -- > > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > -- > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > > http://mailplus.yahoo.com > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
