Hello there, I'm indexing a database driven website with about 10.000 text pages, about 1.000.000 forum entries and 250.000 public user records. For good reasons I'd rather not let htdig spider the pages, but prefer to list them explicitly in start_url.
1) Is there a limit on the number of urls in the start_url attribute? What about the system load when I start htdig with 1,25 mio urls? 2) How can I prevent htdig to spider any link in general? Is there something like an "no-follow" attribute in the config? (I did not find anything like it.) I could include a limit_urls list with the same content as the start_url list, but this would mean 1,25 mio urls for htdig to parse with each link. Who knows a better way? s.m. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

