On Tue, 6 Mar 2001, David Adams wrote:

+ Include those 10 sites on your limit_urls_to attribute in the configuration
+ file, and set
+ 
+ max_hop_count: 3

It's not that easy surely? The hop count is based on steps from the
start_url list, not the limit_urls-to list. The hop count is also not 100%
reliable at 3.1.5 (and documented as unreliable in update runs).

Personally, I have an application where an option to index one hop beyond
the limit_urls_to point would be nice ... I know, go write it myself!

I suspect the (immediate) practical solution for both Ian and for my
scenario is to take the output from htdig and extract the list of links
that were ruled out. That list could then be fed back into an update run
as the start_urls list with the hopcount set to 1 for me, 3 for Ian.

regards,
        Malcolm.

 [EMAIL PROTECTED]     http://users.ox.ac.uk/~malcolm/

+ From: "Ian Lipsky" <[EMAIL PROTECTED]>
+ 
+ > Is it possible to configure htdig so that it will follow links off the
+ main
+ > site, but limit how many pages off the start page it will go?
+ > for example:
+ >
+ > say i have www.somesite.com/page.html and page.html has links on it to 10
+ > other sites. I would want htdig to crawl those 10 other sites, but only
+ > crawl them to a depth of say 3 links off my start page.




_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to