I would gladly help. I fear that my Java skills are probably a little limited for the task, but hey, why not. I would certainly need some guidance as to where to start from. I'm just to unfamiliar with complexes queries structures and scoring methodology. While I'm pretty sure reading the entire code would be a great exercise, I'm not sure I can afford the time needed to learn everything the hard way...
Doug, do you have any clues form where I can start from ? Thanks, Jp -----Original Message----- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 12:30 PM To: java-user@lucene.apache.org Subject: Re: Implementation of a ScoreObject ? Robichaud, Jean-Philippe wrote: >Probably the simplest/ideal schema of the ScoreObject would be something >like a hashtable with Term being the keys and a TermScoreObject the value. >The TermScoreObject would be filled at search time (if asked) and would >contain all values used in the calculation of the "similarity score". That >way we could easily know what is the contribution of a specific term to the >overall score. > > Jean-Philippe, Some of us have talked about a score object in the past and agree that this would be a very good thing. In addition to providing a sounder foundation for explanation, such a mechanism could help to provide better scoring. For example, one limitation in Lucene now is that score normalization is ad hoc -- all scores are divided by the highest score IF the highest score is greater than 1, and whether or not the highest unnormalized score is greater that 1 is pretty much random. This yields a situation where scores across multiple searches are not comparable (notwithstanding many applications that do compare them, getting random results). With a score object, one would like to keep additional information, e.g., a count of boost-weighted query terms and the boost-weighted percentage of such terms that were matched by each result. This could provide a more intrinsic normalization scheme, e.g., defining the highest score as the boost-weighted percentage of matched query terms and dividing all scores by the same constant to achieve this. (Some additional refinements are necessary to handle things like MultiTermQuery's, which rewrite to BooleanQuery's with coord disabled -- such lists of alternate query terms should count as one term). That is one addition example of something score objects could be used for. A general mechanism should provide for easy extension such that different scoring classes could collect, record and aggregate different information for various purposes. I've wanted to work on this for a while but haven't found the time. I know Doug has had a score object mechanism on his radar screen (he first suggested this approach to me as a solution to the normalization issue I'm concerned about). I expect he has a good approach in mind. It would be great if you'd tackle this -- I'd be happy to help if that makes sense. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]