Re: Question about scoring normalisation

Karl Koch Sun, 06 Nov 2005 11:43:39 -0800

Hello Ira,

I am not sure if I know exactly what pivoted normalisation is. I can tell
you what I do, in the meantime I will have a look to your paper and I hope
that we can discuss this issue further.


I work in personalised searching where a second model - a user model - 
expresses extra relevance of a document; however from a different
prespective. I also value the use of TF/IDF to represent the content sided
view on content relevance. Therefore, I would like to combine both scores
(the one from Lucene and the one from my model) into a combined score.
Furthermore, I need to normalise this score so it cannot get higher than
1.0. At the moment, I have a visulisation scheme that works on this premise.

In Lucene, scores seem to be normalised only if they exceed the 1.0 maximum
border. If they do not exceed this max value, they are left where they are. 

Based on that I have now solved my problem to combine my two scores (Lucene
and mine) and if they exceed, I normalise the scores like Lucene does. I
think this is the most accurate think I could do in this case where I do not
violoate the overall meaning of scoring for the user. 

I realise that if I always normalise (to a 0.0 to 1.0 range) I will
introduce a dangerous feature that basically boosts a bunch of low scored
documents (e.g. all between 0.1 and 0.2) unnecessarily high (in this case
right up to between 0.5 and 1.0).

What would you say about that? Does is make sense?

Kind Regards,
Karl


> --- Ursprüngliche Nachricht ---
> Von: Ira Goldstein <[EMAIL PROTECTED]>
> An: "Karl Koch" <[EMAIL PROTECTED]>
> Betreff: Re: Question about scoring normalisation
> Datum: Sun, 06 Nov 2005 08:08:59 -0500
> 
> Karl --
>   Hi.  I've been thinking about adding a pivoted normalization to Lucene
> (see
> attached paper).  I've just started to look at the current code to see how
> the
> tf-idf (sum of squares?) has been implemented.  I wanted to do that before
> begining any coding?  
>   Is pivoted normalization the sort of thing you were asking about?
>   Take care
> --Ira
> 
> ------ Original Message ------
> Received: Sat, 05 Nov 2005 03:26:12 PM EST
> From: "Karl Koch" <[EMAIL PROTECTED]>
> To: "User " <java-user@lucene.apache.org>
> Subject: Question about scoring normalisation
> 
> > Hello all,
> > 
> > I am wondering how many of you actually work with own scoring mechanism
> > (overwriting Lucenes standard scoring) and how many of you do work on
> how
> to
> > normalise this score. 
> > 
> > I would like to add a second score on top of Lucenes TF/IDF score. The
> > resulting score is most likely higher then 1.0. However, the score
> should
> be
> > between 0.0 and 1.0. What is the best way to do that? If Lucene is
> > normalising its score (if no boosting is applied) to a maximium of 1.0,
> how
> > is this done (in Lucene 1.2 and/or beyond) ?
> > 
> > Regards,
> > Karl
> > 
> > 
> > -- 
> > Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
> > DSL-Flatrate für nur 4,99 Euro/Monat*  http://www.gmx.net/de/go/dsl
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> 

-- 
Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
DSL-Flatrate für nur 4,99 Euro/Monat*  http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about scoring normalisation

Reply via email to