At 12:47 PM +0100 10/29/01, Peter Asemann wrote: >So what version should I install? Should I use multiple databases or just >one?
I would install the current pre-release of 3.1.6. <http://www.htdig.org/files/snapshots/> >The 'environment' will be a rather big university, where sites of several >douzens of institutes will have to be indexed, with the possibility >to restrict the search to groups of or to single institutes. Depending on your search criteria, you may need to have multiple databases. If the institutes all have unique URL patterns (e.g. different servers), then you could have one big database and use the restrict/exclude features to confine the search. However, it's sometimes difficult to define arbitrary "groups" this way with 3.1.x versions. <http://www.htdig.org/hts_form.html> >So assuming a rather big index, what CPU/Ram would be needed to get results >really fast? I don't want google and alltheweb to be faster than a local >search. This is something of a silly request. These companies have large server farms, carefully tuned setups (including pre-cached queries) and large portions of the database in RAM. If you'd like to emulate all this because you feel htsearch is too slow on "normal" hardware, I'd suggest: * Setting up a small server farm of your own * Getting 2GB or more RAM for each server (i.e. have the OS cache the database) * Get a very nice RAID setup to serve the database * Finding a developer willing to flesh out some search caching algorithms for htsearch * Carefully tune your "bad words" file to include the most common words on these sites (these carry almost no information for a search, e.g. "the" in English). You can, of course, get pretty good performance without all of these if you get a nice fast hard drive, a chunk of RAM and a tuned bad_words file. Regards, -- -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

