Yonik Seeley wrote:
Totally untested, but here is a hack at what the scorer might look
like when the number of terms is large.
Looks plausible to me.
You could instead use a byte[maxDoc] and encode/decode floats as you
store and read them, to use a lot less RAM.
// could also use a bitset to keep track of docs in the set...
I think that is probably a very important optimization.
If you implemented both of these suggestions, this would use 5 bits/doc,
instead of 33 bits/doc. With a 100M doc index, that would be the
difference between 62MB/query and 412MB/query. The classic term
expanding approach uses perhaps 2k/term. So, with a 100M document
index, the byte array approach uses less memory for queries which expand
to more than 3,100 terms. The float-array method uses less memory for
queries with more than 206k terms.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]