Tony,
If your improvements are of general utility, please contribute them.
Even if they are not, post them as-is and perhaps someone will take the
time to make them more reusable.
Cheers,
Doug
Tony Schwartz wrote:
I think there are a few things that should be added to lucene to really give a
huge
benefit to applications of lucene that are centered around dates. If documents
are
added in date order (generally but not exactly), you can use this fact to
improve memory
usage of lucene in several ways.
1. a sparse bitset can be used instead of a full array for Date RangeFilters.
2. sorting can improved by storing the StringIndex (sort array) to disk when
index is
updated. Then, load only the portions required for a particular search. If
most people
will be searching more recent docs and so you can keep those portions of the
sort array
in memory and load only those "older" portions when needed.
3. use the same sparse (and reversible) bitset instead of the lucene BitVector
for
storing the deleted docs for a particular segment. (very old docs are probably
deleted
again, based on date).
4. sorting can also be greatly improved by NOT storing the field values in
memory if
the index is not used in a "multi-index" environment.
I have implemented these techniques for my particular implementation of an
application
logs search tool and have seen incredible results. I have many users searching
50
million application logs (1k each) with 512 MB memory for my app where users
are sorting
and filtering on every search.
Again, these features will only be useful for indexes that have relative date
to docid
correlation (which I believe happens to be very common).
Tony Schwartz
[EMAIL PROTECTED]
"What we need is more cowbell."
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]