Yonik Seeley wrote:
Scoring recap... I think I've seen 4 different types of scoring
mentioned in this thread for a term expanding query on a single field:
1) query_boost
2) query_boost * (field_boost * lengthNorm)
3) query_boost * (field_boost * lengthNorm) * tf(t in q)
4) query_boost * (field_boost * lengthNorm) * tf(t in q) * idf(t in q)
1 & 2 can be done with ConstantScoreQuery
4 is currently done via rewrite to BooleanQuery and limiting the
number of terms.
3 is unimplemented AFAIK.
3 is easy to implement as a subcase of 4, no?
The challenge is to implement 3 or 4 efficiently for very large queries
w/o using gobs of RAM. One option is to keep a score per document,
making the RAM use proportional to the size of the collection (or at
least the number of non-zero matches, if a sparse representation is
used) or, as in 4, proportional to the number of terms in the query
(with a large constant--an i/o buffer).
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]