Re: Interpreting the score asociated with the Term? |

Otis Gospodnetic Sat, 25 Jan 2003 20:41:00 -0800

Maybe we'll do this in the future, but what I provided is really basic,
not Lucene specific info.  If somebody else describes it in detail I'll
gladly put it up on the site.


Otis

--- Terry Steichen <[EMAIL PROTECTED]> wrote:
> Otis,
> 
> I think the effort you made in your previous message (to describe the
> basic
> relevance measures in simple, non-algorithmic terms) is very
> important.  If
> you think that list is reasonably comprehensive (that is, it captures
> most
> of what relevance means), I'd urge you to insert this into the
> documentation.  I think it is very valuable.
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Thursday, January 23, 2003 12:02 PM
> Subject: Re: Interpreting the score asociated with the Term? |
> 
> 
> > Yes, I believe so.
> >
> > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > Otis,
> > >
> > > Didn't somebody (Doug?) also mention that a keyword in a shorter
> > > document is
> > > deemed more significant than in a longer one (because, I guess,
> it
> > > represents a larger percentage of the document)?
> > >
> > > Regards,
> > >
> > > Terry
> > > ----- Original Message -----
> > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>;
> > > <[EMAIL PROTECTED]>
> > > Sent: Thursday, January 23, 2003 10:58 AM
> > > Subject: Re: Interpreting the score asociated with the Term? |
> > >
> > >
> > > > Here is a simplified explanation of some basic stuff.
> > > >
> > > > 1. the more frequent the term (in a collection) the lower its
> > > weight
> > > > (significance).  Makes sense - very popular words don't
> distinguish
> > > one
> > > > document from the other much, because they are present in so
> many
> > > docs.
> > > >
> > > > 2. the more frequent a word in a single document, the higher
> the
> > > > documents 'value' when the query contains that word.  So the
> score
> > > goes
> > > > up for frequent words in a document, esp. if they are not
> frequent
> > > in
> > > > other documents in the collection.
> > > >
> > > > 3. there is a boost factor which allow you to boost certain
> terms
> > > at
> > > > query time (e.g. you value matches in title field more than the
> > > body
> > > > field?  boost title field queries)
> > > >
> > > > 4. normalization factor, I believe, normalizes things so that
> > > longer
> > > > documents don't have advantage over shorter ones.
> > > >
> > > > There is more to this....but I am already not 100% about all of
> the
> > > > above, so I'll stop here :)
> > > >
> > > > Also note that you can boost fields at index time (you'll have
> to
> > > use
> > > > the nightly build for that instead of the 1.2 release to get
> this,
> > > I
> > > > believe).
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- Rishabh Bajpai <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am using Lucene as a Search Engine for my work. I am new to
> > > this,
> > > > > so forgive me if I am asking a cliched question!
> > > > >
> > > > > I need to understand how the SCORE for the search TERMs is
> > > calculated
> > > > > for Lucene, so that indexing can be appropriately be designed
> to
> > > > > return the most relevant results, when searched.
> > > > >
> > > > > On the official FAQ page of the Lucene site, a formula is
> listed
> > > as
> > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t /
> norm_d_t *
> > > > > boost_t) * coord_q_d
> > > > > where:
> > > > >   score_d   : score for document d
> > > > >   sum_t     : sum for all terms t
> > > > >   tf_q      : the square root of the frequency of t in the
> query
> > > > >   tf_d      : the square root of the frequency of t in d
> > > > >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> > > > >   numDocs   : number of documents in index
> > > > >   docFreq_t : number of documents containing t
> > > > >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> > > > >   norm_d_t  : square root of number of tokens in d in the
> same
> > > field
> > > > > as t
> > > > >   boost_t   : the user-specified boost for term t
> > > > >   coord_q_d : number of terms in both query and document /
> number
> > > of
> > > > > terms in query
> > > > >
> > > > > I didnot find the formula too helpful in figuring out what
> > > exactly
> > > > > the score is trying to calculate.
> > > > >
> > > > > I want to know of a logic that can be used for translating
> this
> > > score
> > > > > into something that can be used for determining which Terms
> are
> > > more
> > > > > relevant for a given Search Request.
> > > > >
> > > > > One way would be to just assume that - higher the score, more
> > > > > relveant is the search. But is this assumption really valid?
> Or
> > > are
> > > > > there any possible caveats to this?
> > > > >
> > > > > -Rishabh
> > > > >
> > > > >
> > > > >
> > > > > _____________________________________________________________
> > > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> > > $19.95/year.
> > > > >
> > >
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > > For additional commands, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Interpreting the score asociated with the Term? |

Reply via email to