Hello,

I am working on using Lucence based indexes for the ASF's mod_mbox. Current versions of mod_mbox support MIME, and I am trying to add full text searching. (Then we can completely remove Eyebrowse)

Currently I am hacking around with the C++ (CLucence) Implementation, but I intend to migrate to Lucence4c shortly.

I was structuring one Lucence Index per-mailing list. To search All mailing lists, I was planning on using a MultiSearcher.

Currently, the ASF public mail archives use about 17 Gigs, uncompressed, in the raw mbox format.

There are also about ~300 mailing lists in the public archives.

Can a multi-searcher quickly search 300 different indexes? I am thinking that it will not. 300 separate indexes is lots of files to scan, even if Lucence is fast. Any experience from other users would be helpful.

Would it better to have a Single Main Index, for all of the lists, and include the List Names as a keyed field?

I suspect most searches would be restricted to one or two lists, but I would like good performance if I wanted to search all of the ASF lists.

Ideas/Comments?  Anyone willing to help me write some C :) ?

Thanks,

-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to