Jim said: > Did you try setting ignore_dead_servers to false as I suggested earlier?
Yes, I added that and it did help a bit. Did a crawl that conked out after about 33,000 pages. So at least we doubled the number. I'm trying to weed out a few more duplicates to try and reduce the crawl size to see if that helps. I've added more directories to the exclude list. Does the exclude list allow regex? The reason I ask is that, for example sake, a story about the new Htdig movie may appear in: /news/main/2004/08/30/htdig-casting /news/hughgrant/2004/08/30/htdig-casting /news/angelinajolie/2004/08/30/htdig-casting /news/melgibson/2004/08/30/htdig-casting Needless to say, I don't need to index that story four times. But since the list of people's continuosly growing, I can't keep heading back into the conf file and add each and every /news/celebname Thanks again. ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general