I am indexing a large university with multiple web servers and lots of content.
I'm using an early 3.2b4 on Solaris 8 Sparc.
My strategy has been to do multiple parallel digs of the different sites (this speeds up the digging) then use htmerge to join them up.
htmerge however consumes lots of memory (although this may be addressed in later betas) and I find I need a machine with 12Gb Ram to avoid swapping. I hit the 2Gb memory limit on a 32bit compile so I have had to do a 64bit compile, which required a few edits to the source code. You may also hit file size limits on a 32 bit system. My word db (uncompressed, as compression was causing it to crash) can reach > 2Gb.
Our other issue is that with large databases, search times are slow if a lot of results are returned. I believe this is because of the relevance sorting that is being done in memory. I am hoping that upgrading to a machine with fast memory, fast processor, and more cache will improve results. I think it is memory latency that is probably the bottleneck.
To get by concurrent search and index, I build a new index separately, and when ready just mv the old one out of the way and mv the new one in. Thus downtime for search is less than a second.
I will be trying the 3.2b5 release soon so will share any experiences.
Regards,
Sandy
------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

