agreed (+0)
> -----Original Message-----
> From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 23, 2003 11:07 AM
> To: Lucene Users List
> Subject: Re: Interpreting the score asociated with the Term? |
>
>
> Otis,
>
> I think the effort you made in your previous message (to
> describe the basic
> relevance measures in simple, non-algorithmic terms) is very
> important. If
> you think that list is reasonably comprehensive (that is, it
> captures most
> of what relevance means), I'd urge you to insert this into the
> documentation. I think it is very valuable.
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Thursday, January 23, 2003 12:02 PM
> Subject: Re: Interpreting the score asociated with the Term? |
>
>
> > Yes, I believe so.
> >
> > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > Otis,
> > >
> > > Didn't somebody (Doug?) also mention that a keyword in a shorter
> > > document is
> > > deemed more significant than in a longer one (because, I guess, it
> > > represents a larger percentage of the document)?
> > >
> > > Regards,
> > >
> > > Terry
> > > ----- Original Message -----
> > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>;
> > > <[EMAIL PROTECTED]>
> > > Sent: Thursday, January 23, 2003 10:58 AM
> > > Subject: Re: Interpreting the score asociated with the Term? |
> > >
> > >
> > > > Here is a simplified explanation of some basic stuff.
> > > >
> > > > 1. the more frequent the term (in a collection) the lower its
> > > weight
> > > > (significance). Makes sense - very popular words don't
> distinguish
> > > one
> > > > document from the other much, because they are present
> in so many
> > > docs.
> > > >
> > > > 2. the more frequent a word in a single document, the higher the
> > > > documents 'value' when the query contains that word.
> So the score
> > > goes
> > > > up for frequent words in a document, esp. if they are
> not frequent
> > > in
> > > > other documents in the collection.
> > > >
> > > > 3. there is a boost factor which allow you to boost
> certain terms
> > > at
> > > > query time (e.g. you value matches in title field more than the
> > > body
> > > > field? boost title field queries)
> > > >
> > > > 4. normalization factor, I believe, normalizes things so that
> > > longer
> > > > documents don't have advantage over shorter ones.
> > > >
> > > > There is more to this....but I am already not 100%
> about all of the
> > > > above, so I'll stop here :)
> > > >
> > > > Also note that you can boost fields at index time
> (you'll have to
> > > use
> > > > the nightly build for that instead of the 1.2 release
> to get this,
> > > I
> > > > believe).
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- Rishabh Bajpai <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am using Lucene as a Search Engine for my work. I am new to
> > > this,
> > > > > so forgive me if I am asking a cliched question!
> > > > >
> > > > > I need to understand how the SCORE for the search TERMs is
> > > calculated
> > > > > for Lucene, so that indexing can be appropriately be
> designed to
> > > > > return the most relevant results, when searched.
> > > > >
> > > > > On the official FAQ page of the Lucene site, a
> formula is listed
> > > as
> > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t
> / norm_d_t *
> > > > > boost_t) * coord_q_d
> > > > > where:
> > > > > score_d : score for document d
> > > > > sum_t : sum for all terms t
> > > > > tf_q : the square root of the frequency of t
> in the query
> > > > > tf_d : the square root of the frequency of t in d
> > > > > idf_t : log(numDocs/docFreq_t+1) + 1.0
> > > > > numDocs : number of documents in index
> > > > > docFreq_t : number of documents containing t
> > > > > norm_q : sqrt(sum_t((tf_q*idf_t)^2))
> > > > > norm_d_t : square root of number of tokens in d in the same
> > > field
> > > > > as t
> > > > > boost_t : the user-specified boost for term t
> > > > > coord_q_d : number of terms in both query and
> document / number
> > > of
> > > > > terms in query
> > > > >
> > > > > I didnot find the formula too helpful in figuring out what
> > > exactly
> > > > > the score is trying to calculate.
> > > > >
> > > > > I want to know of a logic that can be used for
> translating this
> > > score
> > > > > into something that can be used for determining which
> Terms are
> > > more
> > > > > relevant for a given Search Request.
> > > > >
> > > > > One way would be to just assume that - higher the score, more
> > > > > relveant is the search. But is this assumption really
> valid? Or
> > > are
> > > > > there any possible caveats to this?
> > > > >
> > > > > -Rishabh
> > > > >
> > > > >
> > > > >
> > > > > _____________________________________________________________
> > > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> > > $19.95/year.
> > > > >
> > >
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > > For additional commands, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>