aseek-devel  

[aseek-devel] Is it a bug? Or is a new feature needed?

J and T
Sun, 22 Sep 2002 15:21:06 -0700

Today I sent a crawl of about 200,000 URLs. One of the sites contained about 
2,000 URLs is no longer an active site. They closed their doors. When 
indexer is running it responds with "Can't connect to host". It seems DNS 
records are still active (never removed) for the domain, but the site is not 
operational. The problem is that index still tries to connect to this host 
for every single page in the index. Because we don't time out for like 90 
seconds, index pretty much hangs forever. Sure if I monitor index 24/7 I 
guess I could halt its operation and then do an ./index -C "http//sitenmae%" 
and then start the process all over, but I'm not always sitting there.

The best solution would be if index had the ability to mark all URLs as 
status 500 so the indexer would't hang on all future URLs requested for this 
domain it would certainly improve performance.

Just a suggestion,
John


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com