According to Stephen P. Ryder: > A url in the following format is indexed fine: > http://www.casebook.org/ripper_media/book_reviews/fiction/book_detail.ht > ml?id=d29c3572c8a0d74c315b15af07723ea1 > > While a url in the following format is totally ignored by htdig: > http://casebook.org/forum/messages/4922/5071.html?1045783965 > > I want both to be indexed and searchable. Any ideas? I am using the > latest version of htdig, just installed it last week.
There are actually two latest versions, depending on whether you use the beta snapshots or not. They are 3.1.6 and 3.2.0b4-20030216. The differences are profound, so it's good to be specific about the actual number. See http://www.htdig.org/FAQ.html#q5.33 At first glance at the URLs above, I see that the one that's ignored doesn't have the "www." part in the host name. That may be the problem. E.g., if you have a start_url of http://www.casebook.org/, and you leave the limit_urls_to as the default of ${start_url}, htdig will reject any URL that doesn't have exactly the sequence of characters "http://www.casebook.org/" in it. You can find out why htdig is ignoring or rejecting URLs by looking at the verbose output. See http://www.htdig.org/FAQ.html#q5.27 -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

