At 12:47 PM +0100 10/29/01, Peter Asemann wrote:
>So what version should I install? Should I use multiple databases or just
>one?

I would install the current pre-release of 3.1.6. 
<http://www.htdig.org/files/snapshots/>

>The 'environment' will be a rather big university, where sites of several
>douzens of institutes will have to be indexed, with the possibility 
>to restrict the search to groups of or to single institutes.

Depending on your search criteria, you may need to have multiple 
databases. If the institutes all have unique URL patterns (e.g. 
different servers), then you could have one big database and use the 
restrict/exclude features to confine the search. However, it's 
sometimes difficult to define arbitrary "groups" this way with 3.1.x 
versions.

<http://www.htdig.org/hts_form.html>

>So assuming a rather big index, what CPU/Ram would be needed to get results
>really fast? I don't want google and alltheweb to be faster than a local
>search.

This is something of a silly request. These companies have large 
server farms, carefully tuned setups (including pre-cached queries) 
and large portions of the database in RAM. If you'd like to emulate 
all this because you feel htsearch is too slow on "normal" hardware, 
I'd suggest:

* Setting up a small server farm of your own
* Getting 2GB or more RAM for each server (i.e. have the OS cache the database)
* Get a very nice RAID setup to serve the database
* Finding a developer willing to flesh out some search caching 
algorithms for htsearch
* Carefully tune your "bad words" file to include the most common 
words on these sites (these carry almost no information for a search, 
e.g. "the" in English).

You can, of course, get pretty good performance without all of these 
if you get a nice fast hard drive, a chunk of RAM and a tuned 
bad_words file.

Regards,
-- 
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to