Tony,

If your improvements are of general utility, please contribute them. Even if they are not, post them as-is and perhaps someone will take the time to make them more reusable.

Cheers,

Doug

Tony Schwartz wrote:
I think there are a few things that should be added to lucene to really give a 
huge
benefit to applications of lucene that are centered around dates.  If documents 
are
added in date order (generally but not exactly), you can use this fact to 
improve memory
usage of lucene in several ways.

1.  a sparse bitset can be used instead of a full array for Date RangeFilters.
2.  sorting can improved by storing the StringIndex (sort array) to disk when 
index is
updated.  Then, load only the portions required for a particular search.  If 
most people
will be searching more recent docs and so you can keep those portions of the 
sort array
in memory and load only those "older" portions when needed.
3.  use the same sparse (and reversible) bitset instead of the lucene BitVector 
for
storing the deleted docs for a particular segment. (very old docs are probably 
deleted
again, based on date).
4.  sorting can also be greatly improved by NOT storing the field values in 
memory if
the index is not used in a "multi-index" environment.

I have implemented these techniques for my particular implementation of an 
application
logs search tool and have seen incredible results.  I have many users searching 
50
million application logs (1k each) with 512 MB memory for my app where users 
are sorting
and filtering on every search.

Again, these features will only be useful for indexes that have relative date 
to docid
correlation (which I believe happens to be very common).

Tony Schwartz
[EMAIL PROTECTED]
"What we need is more cowbell."

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to