At 6:19 PM -0400 6/29/01, Adolfo Santiago wrote:
>I'm pretty sure that I know what the cause is.  Our ISP isn't necessarily
>the most reliable in the world and on occasion a page may time out.
>
>But I'm kind of expecting htdig to handle this situation...

The biggest difficulty I've seen with unreliable connections is when 
the connection exists, but just slips a few packets through at a very 
slow rate. So nothing times out, but it just stays open.

>process stops.  As I said, sometimes it resumes, but usually the page gets
>reported as "not found" or something like that.  Often, the process stays
>stopped.  In either case, I end up with an index that I can't use.

If a timeout occurs and a page is reported as "not found," then htdig 
will just keep going. If it can no longer contact the host, then it 
will give up (as of version 3.1.5) and take care of other servers. In 
both cases, the database will be just fine.

>access, and then have it go back and try to get just those again?  In our

You can prevent bad URLs from being removed by setting the 
remove_bad_urls attribute. On the other hand, there isn't an easy way 
to just get those URLs, though an update dig is much faster than the 
initial dig.

-- 
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to