Re: RE : Re: problem undestanding the hits.score

Erick Erickson Fri, 02 Nov 2007 09:33:04 -0800

I strongly recommend against this. Simple word counts are a poor
measure of relevance. Which is why Lucene doesn't score that
way. Do you have an example showing why the default scoring is
inadequate or is this just an assumption?


It would be helpful if you gave us some idea of what you're trying
to accomplish. What is the use-case you're trying to solve? That
would generate more helpful responses I think....

Best
Erick

On 11/2/07, Jamal H Tandina <[EMAIL PROTECTED]> wrote:
>
> <<<<
>
> If you want to give priority to documents that are larger, like z1, you
>
> should change the DefaultSimilarity (at index time), more exactly the
> method:
>
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(1.0 / Math.sqrt(numTerms));
>   }
>
> to something like this
>
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(Math.sqrt(numTerms));
>
>
>   }
> >>>
>
> I want to give priority to documents that have the word we are searching
> more frequent !
>
> Thank you
>
>
>
> Jamal H Tandina <[EMAIL PROTECTED]> a écrit : Thank you  for your reply
>
> How can i change the defaultSimilarity in the indexing and the searching,
> do you have an example or an url how to set the Similarity  ?
>
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
>
> Thanks again
>
> Ion Badita  a écrit : Try too look at Similarity, there you will find
> thinks about the
> scoring. Your query is more "similar" with the shorter document.
> If you have 2 documents with a field body; first with words "red flower"
> and the second with just one word "flower", and search for the word
> "flower", the second document will score high because is very similar
> with the query.
>
> If you want to give priority to documents that are larger, like z1, you
> should change the DefaultSimilarity (at index time), more exactly the
> method:
>
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(1.0 / Math.sqrt(numTerms));
>   }
>
> to something like this
>
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(Math.sqrt(numTerms));
>   }
>
>
> Reindex your documents with the Similarity modified and try to search
> again. The IndexWriter has a method to set the similarity used for
> indexing.
>
>
> I hope this will help you...
>
>
> Ion
>
>
>
> Jamal jamalator wrote:
> >  Hi
> >
> > I have indexed this html document
> > =============z1========================
> >
> >
> > zo zo zo zo zo zo zo zo zo zo zo zo
>
> > zo zo zo zo zo zo zo zo zo zo zo zo
>
> > zo zo zo zo zo zo zo zo zo zo zo zo
> >
> >
> > =============z2=========================
> >
> >
> >  zo zo zo zo zo zo zo zo zo zo zo zo
>
> >  zo zo zo zo zo zo zo zo zo zo zo zo
>
> >
> >
> > =============z3==========================
> >
> >
> >  zo zo zo zo zo zo zo zo zo zo zo zo
>
> >
> >
> > =========================================
> > with this code
> >
> > Field contentK1 = new  Field("htmlcontent",httpd.getContentKeywords(),
> Field.Store.NO,Field.Index.TOKENIZED );
> > contentK1.setBoost(1/10f);  //10%
> > doc.add(contentK1);
> >
> > and when a search "zo" with luke i have (whitespaceanalyser):
> >
> > (score , id   )
> > (0,0957,z2 )
> > (0,0947,z3 )
> > (0,0938,z1)
> >
> > NORMALY the resut expected have to be z1 z2 z3
> >
> > Some One have an idea ??
> >
> > Thank you all
> >
> >
> >
> > ---------------------------------
> >   Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers
> Yahoo! Mail
> >
> >
> > ---------------------------------
> >  Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> Mail
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
> ---------------------------------
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> Mail
>
>
> ---------------------------------
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> Mail

Re: RE : Re: problem undestanding the hits.score

Reply via email to