aseek-devel  

Re: [aseek-devel] Is it a bug? Or is a new feature needed?

Kir Kolyshkin
Mon, 23 Sep 2002 00:36:59 -0700

The best solution is to run many threads (say, -N 50 is not that bad).
If one site will be unavailable, 1 thread will try to reach it, and
other 49 threads will continue running fine, so you will have 2%
indexing speed decrease, which seems to be OK to me.

J and T wrote:
> Today I sent a crawl of about 200,000 URLs. One of the sites contained 
> about 2,000 URLs is no longer an active site. They closed their doors. 
> When indexer is running it responds with "Can't connect to host". It 
> seems DNS records are still active (never removed) for the domain, but 
> the site is not operational. The problem is that index still tries to 
> connect to this host for every single page in the index. Because we 
> don't time out for like 90 seconds, index pretty much hangs forever. 
> Sure if I monitor index 24/7 I guess I could halt its operation and then 
> do an ./index -C "http//sitenmae%" and then start the process all over, 
> but I'm not always sitting there.
> 
> The best solution would be if index had the ability to mark all URLs as 
> status 500 so the indexer would't hang on all future URLs requested for 
> this domain it would certainly improve performance.
> 
> Just a suggestion,
> John
> 
> 
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
> 


-- 
-- [EMAIL PROTECTED]  ICQ7551596  [EMAIL PROTECTED] --
    Guinness a Day Keeps a Doctor Away (people's wisdom)