Thanks for the reply, Geoff.
I changed the remove_bad_urls to not remove and I'm hoping this will make a
difference on updates as you indicated.
But htdig continues to hang and, also as you said, it's leaving connections
open.
Does the next version address this? Is there a patch or change I can
implement in the current source code that will streamline this a little bit?
Are there plans to implement some way to go back and revisit just the URLs
that were not available at the time of the initial scan?
Thanks!
Adolfo "Chago" Santiago
Principle of Minimum Access: "That which is not explicitly permitted is
denied."
Public Key: http://pgpkeys.mit.edu:11371/pks/lookup?op=get&search=0x4E867630
Fingerprint: 0EDB 438E 1222 6DFD B80F 4686 484D 7312 4E86 7630
-----Original Message-----
From: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 30, 2001 8:01 PM
To: Adolfo Santiago
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] htdig "hangs"
At 6:19 PM -0400 6/29/01, Adolfo Santiago wrote:
>I'm pretty sure that I know what the cause is. Our ISP isn't necessarily
>the most reliable in the world and on occasion a page may time out.
>
>But I'm kind of expecting htdig to handle this situation...
The biggest difficulty I've seen with unreliable connections is when
the connection exists, but just slips a few packets through at a very
slow rate. So nothing times out, but it just stays open.
>process stops. As I said, sometimes it resumes, but usually the page gets
>reported as "not found" or something like that. Often, the process stays
>stopped. In either case, I end up with an index that I can't use.
If a timeout occurs and a page is reported as "not found," then htdig
will just keep going. If it can no longer contact the host, then it
will give up (as of version 3.1.5) and take care of other servers. In
both cases, the database will be just fine.
>access, and then have it go back and try to get just those again? In our
You can prevent bad URLs from being removed by setting the
remove_bad_urls attribute. On the other hand, there isn't an easy way
to just get those URLs, though an update dig is much faster than the
initial dig.
--
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html