On Monday 18 September 2006 23:08, Andy Liu wrote: > For multi-word queries, I would like to reward documents that contain a more > even distribution of each word and penalize documents that have a skewed > distribution. For example, if my search query is: > > +content:fast +content:car > > I would prefer a document that contains each word an equal number of times > over a document that contains the word "fast" 100 times and the word "car" 1 > time. In other words, I would like to compare the scores of each > BooleanQuery term and adjust the score according to the distribution. > > Can somebody point me in the right direction as to how I would implement > this?
It's already there in DefaultSimilarity.tf() which is the square root: (sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2)) Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]