At 2:14 AM -0500 12/1/01, Geoff Hutchison wrote:
>On Sat, 1 Dec 2001 [EMAIL PROTECTED] wrote:
>
>>  This may be premature but I had to ask...
>>
>>  I have added some urls to my start_url in htdig.conf (v3.2.0b4).
>>
>>  3 of the 5 of them have no content and 2 of them are as follows:
>>  (the 5 urls are: http://slis-two.lis.fsu.edu/~G634-1/index.html
>>  http://slis-two.lis.fsu.edu/~G634-2/index.html
>...
>>  request on http://slis-two.lis.fsu.edu/~G634-16/index.htm
>
>OK, here's the thing. If you put a URL of a specific document in the
>start_url, it's generally not what you want if you leave limit_urls_to as
>the default (i.e. ${start_url}).

i'll have to understand later i guess... please read on...

>You'll just index those 5 URLs (or whatever). In this case, you also seem
>to have differences of opinion as to whether the document is really .html
>or .htm. (It seems like it's .html since the "second" indexing couldn't
>find the .htm.)

i think it's really .htm -i'm sure it is. but some of these urls have 
no content whatsoever. what does htdig do then?

>By default, htdig strips off trailing "index.html" in favor of the bare
>/. Shorter URLs, etc. <http://www.htdig.org/attrs.html#remove_default_doc>
>In your case, this is getting you into trouble, since you've set the
>start_urls to point to specific documents and left limit_urls_to. So the
>index.html is stripped, but it then doesn't match the limits, so nothing
>is indexed.
>
>You'd find life much easier if you just index directory URLs or set
>limit_urls_to explicitly.

ok, yes, i see now -i was confused about our structure here a bit.

what does, "set limit_urls_to explicitly" mean, say in this context 
or in a general way? (sorry).

thanks again.

TR

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to