RE: Implementation of a ScoreObject ?

Robichaud, Jean-Philippe Tue, 03 May 2005 15:02:24 -0700

I would gladly help.  I fear that my Java skills are probably a little
limited for the task, but hey, why not.  I would certainly need some
guidance as to where to start from.  I'm just to unfamiliar with complexes
queries structures and scoring methodology.  While I'm pretty sure reading
the entire code would be a great exercise, I'm not sure I can afford the
time needed to learn everything the hard way...

Doug, do you have any clues form where I can start from ?

Thanks, 

Jp

-----Original Message-----
From: Chuck Williams [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 27, 2005 12:30 PM
To: java-user@lucene.apache.org
Subject: Re: Implementation of a ScoreObject ?

Robichaud, Jean-Philippe wrote:

>Probably the simplest/ideal schema of the ScoreObject would be something
>like a hashtable with Term being the keys and a TermScoreObject the value.
>The TermScoreObject would be filled at search time (if asked) and would
>contain all values used in the calculation of the "similarity score".  That
>way we could easily know what is the contribution of a specific term to the
>overall score.  
>  
>
Jean-Philippe,

Some of us have talked about a score object in the past and agree that 
this would be a very good thing.  In addition to providing a sounder 
foundation for explanation, such a mechanism could help to provide 
better scoring.  For example, one limitation in Lucene now is that score 
normalization is ad hoc -- all scores are divided by the highest score 
IF the highest score is greater than 1, and whether or not the highest 
unnormalized score is greater that 1 is pretty much random.  This yields 
a situation where scores across multiple searches are not comparable 
(notwithstanding many applications that do compare them, getting random 
results).  With a score object, one would like to keep additional 
information, e.g., a count of boost-weighted query terms and the 
boost-weighted percentage of such terms that were matched by each 
result.  This could provide a more intrinsic normalization scheme, e.g., 
defining the highest score as the boost-weighted percentage of matched 
query terms and dividing all scores by the same constant to achieve 
this.  (Some additional refinements are necessary to handle things like 
MultiTermQuery's, which rewrite to BooleanQuery's with coord disabled -- 
such lists of alternate query terms should count as one term).

That is one addition example of something score objects could be used 
for.  A general mechanism should provide for easy extension such that 
different scoring classes could collect, record and aggregate different 
information for various purposes.

I've wanted to work on this for a while but haven't found the time.  I 
know Doug has had a score object mechanism on his radar screen (he first 
suggested this approach to me as a solution to the normalization issue 
I'm concerned about).  I expect he has a good approach in mind.  It 
would be great if you'd tackle this -- I'd be happy to help if that 
makes sense.

Chuck

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Implementation of a ScoreObject ?

Reply via email to