Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

Doug Cutting Wed, 16 Nov 2005 11:03:52 -0800

Yonik Seeley wrote:

Totally untested, but here is a hack at what the scorer might look
like when the number of terms is large.


Looks plausible to me.

You could instead use a byte[maxDoc] and encode/decode floats as youstore and read them, to use a lot less RAM.

  // could also use a bitset to keep track of docs in the set...


I think that is probably a very important optimization.

If you implemented both of these suggestions, this would use 5 bits/doc,instead of 33 bits/doc. With a 100M doc index, that would be thedifference between 62MB/query and 412MB/query. The classic termexpanding approach uses perhaps 2k/term. So, with a 100M documentindex, the byte array approach uses less memory for queries which expandto more than 3,100 terms. The float-array method uses less memory forqueries with more than 206k terms.


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

Reply via email to