According to Geoff Hutchison:
> On Fri, 11 Aug 2000 [EMAIL PROTECTED] wrote:
> > syntax in the config files so I know that it isn't that. I'm not sure
> > if it makes a difference but these start URL's all contain /cgi-bin/ and the
>
> I'd make sure you've set the exclude_urls appropriately. Remember that the
> default is to exclude cgi-bin.
My exclude_urls is set to .gif
>Also check limit_urls_to. By default, it takes on the value of start_url,
>which won't do if you list very specific URLs in this parameter, because
>your limit_urls_to won't be open-ended enough to allow other URLs.
As an example, all of the URL's in my start_url look similar to
http://www.foo.ca/cgi-bin/foo2/foo3/foo4/rp_tocs_e?bcb_bcb3-00_78
except that the remaining part after the ? changes
and that page links you to several URL's that look like
http://different.server.ca/cgi-bin/blah/blah/blah/ViewDoc?journal=one&volume=2&file=3.pdf
where the info after the ? changes.
My limit_urls_to attribute looks like
http://www.foo.ca/cgi-bin/foo2/foo3/foo4/rp_tocs_e? \
http://different.server.ca/cgi-bin/blah/blah/blah/RPViewDoc
so I can't see a problem with that. The strange thing here is that it
goes through about 15 of the 50 start_url URLs and then merges. It
seems to me that htdig thinks that it is finished digging for some
reason and I can't pinpoint the reason why.
>So one way to get more information on this
>is to run htdig by itself and add the -vvvv flag for more debugging
>information.
I ran the dig with -vvv and the output seemed fine, it was following
all links, indexing the pdf's, and parsing them perfectly.
I'm stumped,
Sheri
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.