I am currently the administrator for a mailing list archive/search site for a number of internal-only mailing lists (about 20). The current search engine is based on HTDig and works "ok". By ok I mean it seems to work most of the time. Occasionally it fails to reindex no reason I can find, it is horrifically slow to index (~40hrs on a Dual Processor Opteron w/4GB of RAM), and the database that the search indices are stored in corrupts far to easily. To add to all of this development of HTDig seems to have stalled or died completely (not sure which).

Due to a hardware failure on a box that was not being backed up (this was not my box), and a few personnel changes, I am now forced to rebuild the archive/search system from the ground up. What I am being given is access to the mbox files for each mailing list, and pretty much nothing else. I have no access to the admin functions on the mailing list server, nor can I get any changes made to its configuration.

I am leaning toward using Mhonarc to create the archive. What I need suggestions on is a search engine. I am looking for something that can handle a fairly large archive of messages, say on the order of 100-150k messages, that can easily index only new messages, and that can search groups of messages(i.e. I would like it so that you can search across a selected group of mailing lists, all lists, or only a single list). I'd also like something that used a standard DB as the backend (MySQL, Postgres, or something similar).

Due to the nature of the lists, I cannot use an external search engine. Everything must be kept in house. The server I have to host this on is running RHEL 5 and Apache. I have complete control of this server, so I can make changes as I see fit (other than changing the OS).

So does anybody have any suggestions on a search engine that they have used that seems to work well? Did I leave anything out? I see a kitchen sink in the corner I didn't mention, but.....

-G
_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug

Reply via email to