At 2:14 AM -0500 12/1/01, Geoff Hutchison wrote: >On Sat, 1 Dec 2001 [EMAIL PROTECTED] wrote: > >> This may be premature but I had to ask... >> >> I have added some urls to my start_url in htdig.conf (v3.2.0b4). >> >> 3 of the 5 of them have no content and 2 of them are as follows: >> (the 5 urls are: http://slis-two.lis.fsu.edu/~G634-1/index.html >> http://slis-two.lis.fsu.edu/~G634-2/index.html >... >> request on http://slis-two.lis.fsu.edu/~G634-16/index.htm > >OK, here's the thing. If you put a URL of a specific document in the >start_url, it's generally not what you want if you leave limit_urls_to as >the default (i.e. ${start_url}).
i'll have to understand later i guess... please read on... >You'll just index those 5 URLs (or whatever). In this case, you also seem >to have differences of opinion as to whether the document is really .html >or .htm. (It seems like it's .html since the "second" indexing couldn't >find the .htm.) i think it's really .htm -i'm sure it is. but some of these urls have no content whatsoever. what does htdig do then? >By default, htdig strips off trailing "index.html" in favor of the bare >/. Shorter URLs, etc. <http://www.htdig.org/attrs.html#remove_default_doc> >In your case, this is getting you into trouble, since you've set the >start_urls to point to specific documents and left limit_urls_to. So the >index.html is stripped, but it then doesn't match the limits, so nothing >is indexed. > >You'd find life much easier if you just index directory URLs or set >limit_urls_to explicitly. ok, yes, i see now -i was confused about our structure here a bit. what does, "set limit_urls_to explicitly" mean, say in this context or in a general way? (sorry). thanks again. TR _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

