>>> But I don't like baking in search concepts at index time... >> > Many scoring models are possible if you store enough stats in the > index. >
in general the missing stats seem to fit in two buckets/categories: 1) length normalization pivot: average length in bytes, terms, unique terms 2) term frequency normalization factor: max or average tf for the field. you never need more than one of each category for the same field. one approach would be for the search-time similarity to simply use these generic names (i guess they could get some placeholder value if they are not available) and at index time, you make sure you put the one you want (or none at all) in the "bucket" -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org