On Sat, 1 Dec 2001 [EMAIL PROTECTED] wrote:
> This may be premature but I had to ask...
>
> I have added some urls to my start_url in htdig.conf (v3.2.0b4).
>
> 3 of the 5 of them have no content and 2 of them are as follows:
> (the 5 urls are: http://slis-two.lis.fsu.edu/~G634-1/index.html
> http://slis-two.lis.fsu.edu/~G634-2/index.html
...
> request on http://slis-two.lis.fsu.edu/~G634-16/index.htm
OK, here's the thing. If you put a URL of a specific document in the
start_url, it's generally not what you want if you leave limit_urls_to as
the default (i.e. ${start_url}).
You'll just index those 5 URLs (or whatever). In this case, you also seem
to have differences of opinion as to whether the document is really .html
or .htm. (It seems like it's .html since the "second" indexing couldn't
find the .htm.)
By default, htdig strips off trailing "index.html" in favor of the bare
/. Shorter URLs, etc. <http://www.htdig.org/attrs.html#remove_default_doc>
In your case, this is getting you into trouble, since you've set the
start_urls to point to specific documents and left limit_urls_to. So the
index.html is stripped, but it then doesn't match the limits, so nothing
is indexed.
You'd find life much easier if you just index directory URLs or set
limit_urls_to explicitly.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html