The configuration file for htdig seems geared to the idea of a
few top-level URLs, with lots of pages underneath these. We're
working on an application that will have thousands of top-level
URLs. Most of these will have only a few pages underneath them.
The total number of pages likely is between 10,000 and a 100,000,
which seems in normal range. It's just that we have a forest of
small shrubs (thousands times dozens), instead of a few huge
sequoias (dozens times thousands).
(1) Obviously, we can automatically generate a config file that
lists thousands of start URLs. Will this create performance
problems, i.e., does it take significantly longer to dig though
thousands of small shrubs, than through a few huge trees? Is
this kind of use intended?
(2) Has anyone used htdig in this fashion?
(3) Alternatively, what web crawlers integrate well with htmerge?
(4) Intuitively, this is a conern only for htdig, not htmerge or
htsearch. Is there any reason I should question this intuition?
Thanks!
Russell
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html