RE: Interpreting the score asociated with the Term? |

Sale, Doug Thu, 23 Jan 2003 10:00:53 -0800
agreed (+0)

> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 23, 2003 11:07 AM
> To: Lucene Users List
> Subject: Re: Interpreting the score asociated with the Term? |
> 
> 
> Otis,
> 
> I think the effort you made in your previous message (to 
> describe the basic
> relevance measures in simple, non-algorithmic terms) is very 
> important.  If
> you think that list is reasonably comprehensive (that is, it 
> captures most
> of what relevance means), I'd urge you to insert this into the
> documentation.  I think it is very valuable.
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Thursday, January 23, 2003 12:02 PM
> Subject: Re: Interpreting the score asociated with the Term? |
> 
> 
> > Yes, I believe so.
> >
> > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > Otis,
> > >
> > > Didn't somebody (Doug?) also mention that a keyword in a shorter
> > > document is
> > > deemed more significant than in a longer one (because, I guess, it
> > > represents a larger percentage of the document)?
> > >
> > > Regards,
> > >
> > > Terry
> > > ----- Original Message -----
> > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>;
> > > <[EMAIL PROTECTED]>
> > > Sent: Thursday, January 23, 2003 10:58 AM
> > > Subject: Re: Interpreting the score asociated with the Term? |
> > >
> > >
> > > > Here is a simplified explanation of some basic stuff.
> > > >
> > > > 1. the more frequent the term (in a collection) the lower its
> > > weight
> > > > (significance).  Makes sense - very popular words don't 
> distinguish
> > > one
> > > > document from the other much, because they are present 
> in so many
> > > docs.
> > > >
> > > > 2. the more frequent a word in a single document, the higher the
> > > > documents 'value' when the query contains that word.  
> So the score
> > > goes
> > > > up for frequent words in a document, esp. if they are 
> not frequent
> > > in
> > > > other documents in the collection.
> > > >
> > > > 3. there is a boost factor which allow you to boost 
> certain terms
> > > at
> > > > query time (e.g. you value matches in title field more than the
> > > body
> > > > field?  boost title field queries)
> > > >
> > > > 4. normalization factor, I believe, normalizes things so that
> > > longer
> > > > documents don't have advantage over shorter ones.
> > > >
> > > > There is more to this....but I am already not 100% 
> about all of the
> > > > above, so I'll stop here :)
> > > >
> > > > Also note that you can boost fields at index time 
> (you'll have to
> > > use
> > > > the nightly build for that instead of the 1.2 release 
> to get this,
> > > I
> > > > believe).
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- Rishabh Bajpai <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am using Lucene as a Search Engine for my work. I am new to
> > > this,
> > > > > so forgive me if I am asking a cliched question!
> > > > >
> > > > > I need to understand how the SCORE for the search TERMs is
> > > calculated
> > > > > for Lucene, so that indexing can be appropriately be 
> designed to
> > > > > return the most relevant results, when searched.
> > > > >
> > > > > On the official FAQ page of the Lucene site, a 
> formula is listed
> > > as
> > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t 
> / norm_d_t *
> > > > > boost_t) * coord_q_d
> > > > > where:
> > > > >   score_d   : score for document d
> > > > >   sum_t     : sum for all terms t
> > > > >   tf_q      : the square root of the frequency of t 
> in the query
> > > > >   tf_d      : the square root of the frequency of t in d
> > > > >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> > > > >   numDocs   : number of documents in index
> > > > >   docFreq_t : number of documents containing t
> > > > >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> > > > >   norm_d_t  : square root of number of tokens in d in the same
> > > field
> > > > > as t
> > > > >   boost_t   : the user-specified boost for term t
> > > > >   coord_q_d : number of terms in both query and 
> document / number
> > > of
> > > > > terms in query
> > > > >
> > > > > I didnot find the formula too helpful in figuring out what
> > > exactly
> > > > > the score is trying to calculate.
> > > > >
> > > > > I want to know of a logic that can be used for 
> translating this
> > > score
> > > > > into something that can be used for determining which 
> Terms are
> > > more
> > > > > relevant for a given Search Request.
> > > > >
> > > > > One way would be to just assume that - higher the score, more
> > > > > relveant is the search. But is this assumption really 
> valid? Or
> > > are
> > > > > there any possible caveats to this?
> > > > >
> > > > > -Rishabh
> > > > >
> > > > >
> > > > >
> > > > > _____________________________________________________________
> > > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> > > $19.95/year.
> > > > >
> > > 
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > > For additional commands, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: 
> <mailto:[EMAIL PROTECTED]>
>
RE: Interpreting the score asociated with the Term? |

Reply via email to