I am currently the administrator for a mailing list archive/search site
for a number of internal-only mailing lists (about 20). The current
search engine is based on HTDig and works "ok". By ok I mean it seems to
work most of the time. Occasionally it fails to reindex no reason I can
find, it is horrifically slow to index (~40hrs on a Dual Processor
Opteron w/4GB of RAM), and the database that the search indices are
stored in corrupts far to easily. To add to all of this development of
HTDig seems to have stalled or died completely (not sure which).
Due to a hardware failure on a box that was not being backed up (this
was not my box), and a few personnel changes, I am now forced to rebuild
the archive/search system from the ground up. What I am being given is
access to the mbox files for each mailing list, and pretty much nothing
else. I have no access to the admin functions on the mailing list
server, nor can I get any changes made to its configuration.
I am leaning toward using Mhonarc to create the archive. What I need
suggestions on is a search engine. I am looking for something that can
handle a fairly large archive of messages, say on the order of 100-150k
messages, that can easily index only new messages, and that can search
groups of messages(i.e. I would like it so that you can search across a
selected group of mailing lists, all lists, or only a single list). I'd
also like something that used a standard DB as the backend (MySQL,
Postgres, or something similar).
Due to the nature of the lists, I cannot use an external search engine.
Everything must be kept in house. The server I have to host this on is
running RHEL 5 and Apache. I have complete control of this server, so I
can make changes as I see fit (other than changing the OS).
So does anybody have any suggestions on a search engine that they have
used that seems to work well? Did I leave anything out? I see a kitchen
sink in the corner I didn't mention, but.....
-G
_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug