Eric Luhrs's bits of Tue, 19 Mar 2002 translated to:

>
>I can't get htdig (v3.2.b3) to accept the URLs I specify. Here's what I
>have:

The usual advice is to move to a 3.2.0b4 snapshot if possible.
There have been a lot of fixes since the 3.2.0b3 release.

>       start_url: http://www.shaksper.net/~eluhrs/sites.html
>       limit_urls_to: ${start_url}

Since you are explicitly supplying the initial page (sites.html),
I think you would want something more like

  limit_urls_to: http://www.shaksper.net/~eluhrs/

for limit_urls_to. This assumes everything that you want to index
is under http://www.shaksper.net/~eluhrs/. The limit_urls_to
attribute is defined as a list patterns; only one page will ever
match the http://www.shaksper.net/~eluhrs/sites.html pattern.

>I expect htdig to spider each link specified in sites.html, and index
>everything ON those site, but not links to OTHER sites.  Instead,
>htdig -vvv gives me a bunch of errors like this:
>
>       href: http://www.motleyltd.com.au/ (http://www.motleyltd.com.au/)
>        Rejected: URL not in the limits!
>        url rejected: (level 1)http://www.motleyltd.com.au/

If http://www.motleyltd.com.au/ is not included in limit_urls_to,
nothing on this site will be indexed.

>I only want to index pages WITHIN the sites that I specify, and nothing
>else. What am I missing here?

limit_urls_to should consist of patterns that define the sites to
which you want to limit the dig. Any URL that does not have a
substring matching something in limit_urls_to will be rejected as
"not in the limits!".

Jim


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to