At 6:19 PM -0400 6/29/01, Adolfo Santiago wrote:
>I'm pretty sure that I know what the cause is. Our ISP isn't necessarily
>the most reliable in the world and on occasion a page may time out.
>
>But I'm kind of expecting htdig to handle this situation...
The biggest difficulty I've seen with unreliable connections is when
the connection exists, but just slips a few packets through at a very
slow rate. So nothing times out, but it just stays open.
>process stops. As I said, sometimes it resumes, but usually the page gets
>reported as "not found" or something like that. Often, the process stays
>stopped. In either case, I end up with an index that I can't use.
If a timeout occurs and a page is reported as "not found," then htdig
will just keep going. If it can no longer contact the host, then it
will give up (as of version 3.1.5) and take care of other servers. In
both cases, the database will be just fine.
>access, and then have it go back and try to get just those again? In our
You can prevent bad URLs from being removed by setting the
remove_bad_urls attribute. On the other hand, there isn't an easy way
to just get those URLs, though an update dig is much faster than the
initial dig.
--
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html