Re: score and frequency

Brisbart Franck Mon, 07 Jun 2004 01:46:30 -0700

It seems that you don't the length norm to be used. It's a factor which normalize the score of a doc depending on the size of the searched field of the doc. It's the field which make that 'ground ice' has a higher score than 'ice hockey: British Sekonda Superleague Play-Off Championship: finals' because it only has 2 terms. So, I suggest you to override the lengthNorm method and to ignore the numTokens parameter. NB: The length norm is computed during the indexation and the norm are store in the index (in the _aaa.f# files). So, you need to do re-index your data, and use this similarity during the indexation.

Cheers,
Franck

Niraj Alok wrote:

I have set the searcher.setSimilarity  as well as also tried setting the
coord factor to 1.

The problem as given by an example is : Lets say I have titles to be
displayed depending upon the search.
E.g if i have "ice hockey" as the search item and if it is default
similarity, my results are :

ice hockey0.99999994
ice hockey0.75
ice hockey0.75
winter Olympics: hockey, ice, medallists0.17402513
ice age0.073680125
National Hockey League0.020266924
Cracking the Ice Age0.018420031
ground-ice0.011512519
ice hockey: British Sekonda Superleague Play-Off Championship:
finals0.0069075115
 (the numbers indicating the score).


But if i set the similarity as my overridden one, the results become:
ice hockey0.99999994
ice hockey0.75
ice hockey0.75
ice age0.22104037
winter Olympics: hockey, ice, medallists0.17402513
National Hockey League0.060800765
Cracking the Ice Age0.055260092
ground-ice0.034537554
ice hockey: British Sekonda Superleague Play-Off Championship:
finals0.020722535


I want all the titles which have both "ice" and "hockey" to come above the
rest (to have higher scores)
Meaning i would wish the results to appear like:

ice hockey
ice hockey
ice hockey
winter Olympics: hockey, ice, medallists
ice hockey: British Sekonda Superleague Play-Off Championship: finals
ice age
National Hockey League
Cracking the Ice Age
ground-ice

My overriden similarity class contains just this method:
public float coord(int overlap, int maxOverlap) {

return 1.0f;

I feel it is the weight factor which is producing indesirable results. Any
help in this regard would be highly appreciated.

Regards,
Niraj

----- Original Message -----
From: "Brisbart Franck" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, June 04, 2004 8:46 PM
Subject: Re: score and frequency

Hi,

Be careful to set the default similarity
'Similarity.setDefault(similarity)' before creating your search instance
(IndexSearcher).
If you change the default similarity after, you'll still use the old one.
You'd better use the 'searcher.setSimilarity' method on your searcher.

Franck


Phil brunet wrote:

Hi to all.

Maybe the term frequency is not the only parameter you need to override
to "customize" the score attributed by Lucene.

Maybe you should consider the normalisation factor, the idf and the
coord factor ?

Philippe

From: "Niraj Alok" <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Subject: Re: score and frequency
Date: Fri, 4 Jun 2004 15:13:32 +0530

Hi Erik,

Thanks for the suggestion.

I tried this:
public class RelevanceSimilarity extends DefaultSimilarity

public float tf(float freq) {

System.out.println("discounting frequency");

return (float)1;

and in my query class, I used :

Similarity.setDefault(similarity);

Hits hits = is.search(query);

for(i = 0; i < hits.length(); i ++)

 result = result + hits.score(i);

However, this is still not giving me the expected result. Do I need to

do

something else?


Regards,
Niraj

----- Original Message -----
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, June 04, 2004 1:55 PM
Subject: Re: score and frequency

On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:

Hi,

I am having some problems with the score of lucene.
I am trying to get the results displayed according to hits.score

and

it is giving the results correctly.
However I do not want the frequency factor to be used for the
computation of the score.

Is it possible to get the score which does not have the frequency
factor in it ?

Have a look at the javadocs for Similarity. DefaultSimilarity is


used

unless otherwise specified. You could subclass that and override


this:

  public float tf(float freq) {
    return (float)Math.sqrt(freq);
  }

and return 1.0.  This might give you the effect you want.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_________________________________________________________________
Bloquez les fen�tres pop-up, c'est gratuit ! http://toolbar.msn.fr


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Franck Brisbart
R&D
http://www.kelkoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Franck Brisbart
R&D
http://www.kelkoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: score and frequency

Reply via email to