>>> But I don't like baking in search concepts at index time...
>>
> Many scoring models are possible if you store enough stats in the
> index.
>

in general the missing stats seem to fit in two buckets/categories:

1) length normalization pivot: average length in bytes, terms, unique terms
2) term frequency normalization factor: max or average tf for the field.

you never need more than one of each category for the same field. one
approach would be for the search-time similarity to simply use these
generic names (i guess they could get some placeholder value if they
are not available) and at index time, you make sure you put the one
you want (or none at all) in the "bucket"


-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to