According to Stephen P. Ryder:
> A url in the following format is indexed fine:
> http://www.casebook.org/ripper_media/book_reviews/fiction/book_detail.ht
> ml?id=d29c3572c8a0d74c315b15af07723ea1
>  
> While a url in the following format is totally ignored by htdig:
> http://casebook.org/forum/messages/4922/5071.html?1045783965
>  
> I want both to be indexed and searchable.  Any ideas?  I am using the
> latest version of htdig, just installed it last week.

There are actually two latest versions, depending on whether you
use the beta snapshots or not.  They are 3.1.6 and 3.2.0b4-20030216.
The differences are profound, so it's good to be specific about the
actual number.  See http://www.htdig.org/FAQ.html#q5.33

At first glance at the URLs above, I see that the one that's ignored
doesn't have the "www." part in the host name.  That may be the
problem.  E.g., if you have a start_url of http://www.casebook.org/,
and you leave the limit_urls_to as the default of ${start_url},
htdig will reject any URL that doesn't have exactly the sequence of
characters "http://www.casebook.org/"; in it.  You can find out why
htdig is ignoring or rejecting URLs by looking at the verbose output.
See http://www.htdig.org/FAQ.html#q5.27

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
The most comprehensive and flexible code editor you can use.
Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
www.slickedit.com/sourceforge
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to