J and T
Sun, 22 Sep 2002 15:21:06 -0700
Today I sent a crawl of about 200,000 URLs. One of the sites contained about 2,000 URLs is no longer an active site. They closed their doors. When indexer is running it responds with "Can't connect to host". It seems DNS records are still active (never removed) for the domain, but the site is not operational. The problem is that index still tries to connect to this host for every single page in the index. Because we don't time out for like 90 seconds, index pretty much hangs forever. Sure if I monitor index 24/7 I guess I could halt its operation and then do an ./index -C "http//sitenmae%" and then start the process all over, but I'm not always sitting there. The best solution would be if index had the ability to mark all URLs as status 500 so the indexer would't hang on all future URLs requested for this domain it would certainly improve performance. Just a suggestion, John _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com