: This is not quite what I was talking about. I was talking about documents : with a single field. I want the text "Badgers are mammals. Badgers are cute" : to score higher than the text "Badger Badger" for the term query : "text:badger". : Ideally, what I want is to add another factor to the scoring at index time, : a "sparsity factor" which should cancel out the term frequency as the : average distance between terms nears 1.
something else you my want to consider: you can omitNorms (or alter the lengthNorm function) when indexing so that longer fields aren't penalized compared to shorter fields ... in which case a field containing "Badger Badger" won't score *higher* then "Badgers are mammals. Badgers are cute" because it won't get the short lengthNorm bonus ... if it met your use case, you could even make *longer* docs get a higher lengthNorm. : Sorry about the weird math, I just mean (as I said above) that the sparsity : factor should cancel out the tf completely if avg_d<=1 and become 1 as avg_d : gets larger. it wouldn't exactly match your match, but a simpler approach to consider that might be equally effective would be counting the number of unique terms in this field at index time (or the ratio of unique terms to total terms), and then use that number as the fieldBoost (or index as a numeric field that you build a function query on) ... then you can reward docs that have a higher number of unique terms, and penalize docs that only have a few terms repeated over and over. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org