Paul Elschot wrote:
Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme:
Use Filter and BitSet.
From the personnal data, you build a Filter
(http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil
ter.html) wich is used in the main index.
With 1 billion mails, and possibly a Filter per user, you may want to
use more compact filters than BitSets, which is currently possible
in the development trunk of lucene.
Thanks for the pointers. I've already used Solr's DocSet interface in my
implementation, which I think is where the ideas for the current Lucene
enhancements came from. They work well to reduce the filter's footprint. I'm
also caching filters.
The intention is that there is a user data index and the mail index(es). The
search against user data index will return a set of mail Ids, which is the
common key between the two. Doc Ids are no good between the indexes, so that
means a potentially large boolean OR query to create the filter of labelled
mails in the mail indexes. I know it's a theoretical question, but will this
perform?
The read only data and modifiable user data need to be kept separate because the
RO data can easily be re-created, which means I can't just create the filter as
part of the base search.
Regards
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]