Eric Luhrs's bits of Tue, 19 Mar 2002 translated to: > >I can't get htdig (v3.2.b3) to accept the URLs I specify. Here's what I >have:
The usual advice is to move to a 3.2.0b4 snapshot if possible. There have been a lot of fixes since the 3.2.0b3 release. > start_url: http://www.shaksper.net/~eluhrs/sites.html > limit_urls_to: ${start_url} Since you are explicitly supplying the initial page (sites.html), I think you would want something more like limit_urls_to: http://www.shaksper.net/~eluhrs/ for limit_urls_to. This assumes everything that you want to index is under http://www.shaksper.net/~eluhrs/. The limit_urls_to attribute is defined as a list patterns; only one page will ever match the http://www.shaksper.net/~eluhrs/sites.html pattern. >I expect htdig to spider each link specified in sites.html, and index >everything ON those site, but not links to OTHER sites. Instead, >htdig -vvv gives me a bunch of errors like this: > > href: http://www.motleyltd.com.au/ (http://www.motleyltd.com.au/) > Rejected: URL not in the limits! > url rejected: (level 1)http://www.motleyltd.com.au/ If http://www.motleyltd.com.au/ is not included in limit_urls_to, nothing on this site will be indexed. >I only want to index pages WITHIN the sites that I specify, and nothing >else. What am I missing here? limit_urls_to should consist of patterns that define the sites to which you want to limit the dig. Any URL that does not have a substring matching something in limit_urls_to will be rejected as "not in the limits!". Jim _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

