On Saturday, March 23, 2002, at 02:02  PM, sascha mantscheff wrote:

> rather not let htdig spider the pages, but prefer to list them 
> explicitly in
> start_url.
>
> 1) Is there a limit on the number of urls in the start_url attribute? 
> What
> about the system load when I start htdig with 1,25 mio urls?

No, there's no limit. However, one reason to spider pages is that the 
memory load is lower--you don't have to have all those URLs in an 
assembled list at once.

> 2) How can I prevent htdig to spider any link in general? Is there 
> something
> like an "no-follow" attribute in the config? (I did not find anything 
> like
> it.) I could include a limit_urls list with the same content as the 
> start_url
> list, but this would mean 1,25 mio urls for htdig to parse with each 
> link.

You probably want to see the max_hop_count attribute:

http://www.htdig.org/attrs.html#max_hop_count

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to