Hello there,

I'm indexing a database driven website with about 10.000 text pages, about  
1.000.000 forum entries and 250.000 public user records. For good reasons I'd 
rather not let htdig spider the pages, but prefer to list them explicitly in 
start_url.

1) Is there a limit on the number of urls in the start_url attribute? What 
about the system load when I start htdig with 1,25 mio urls?

2) How can I prevent htdig to spider any link in general? Is there something 
like an "no-follow" attribute in the config? (I did not find anything like 
it.) I could include a limit_urls list with the same content as the start_url 
list, but this would mean 1,25 mio urls for htdig to parse with each link.

Who knows a better way?

s.m.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to