Well, which failure are we talking about? I listed 3 below, and the 3rd one of these can be caused by a number of possible failures. Generally, though, if the problem is one of the first two (Unknown host or proxy host), it can be a configuration problem (you specified the wrong host name in the start_url or http_proxy attribute), but assuming you got these right, it's most commonly the result of an overloaded name server or an intermitant connection between the system running htdig and the system running the DNS server. The third error (Unable to build connection) is most commonly caused by the web server going down while indexing, or an unreliable network connection between your system and the web server. By the web server going down, I mean either the whole system going down, or just the web server processes on that system being shut down. It may also be that an overloaded web server starts refusing connections for a while.
Remember that it only takes one such failure for htdig to stop trying to contact that host, so it's important that you try to address all these possible points of failure so that indexing is reliable. There are a number of steps you can take to address this. 1) Make sure you have a good, solid connection to a reliable DNS server that's not too overloaded. If the DNS server is right on your LAN, or better still, right on the same system on which you run htdig, that's ideal. htdig makes repeated DNS lookups for a host name, once for each document that's fetched, so you want to use a DNS server that's "close to home". Even setting up a caching-only DNS server on the same system as where you run htdig can be a huge help, and many OS distributions make it very easy to do that (often just by installing 1 or 2 pre-configured packages). 2) Make sure you have a good, reliable connection to the Internet, especially with good throughput to and from the hosts you're trying to index. If that network connection is overloaded or flakey, htdig is bound to run into problems. Again, ideally, the hosts you're indexing will be right on your LAN, or you'll run htdig right on the web server (so you can take advantage of the local_urls attribute), but this is often not possible. 3) Try to index web servers at a time when they don't get a lot of traffic (but when they're likely to remain up and running), and try not to overload them yourself with htdig. You can use the server_wait_time attribute to make htdig pause between each document you fetch from the server. It also helps if the web servers you index are themselves reliable and good performers. A reasonably fast PC, with a decent amount or RAM, running Linux and Apache can handle a LOT of traffic before it bogs down. If you're running a proprietary web server on a proprietary OS on legacy hardware, I'd expect you're more likely to run into problems where it can't handle the load. Of course, htdig users don't always have a say in what web servers they index and how they are run, but you can try to minimize the load you're adding to the system. Hope these pointers help. According to Irene talaway: > Thanks Gilles, > but what can I do to avoid this failure? > Gilles Detillieux <[EMAIL PROTECTED]> wrote: > According to Irene talaway: > [deleted] > > Generally, this message should be preceeded by one of the following: > > - Unknown proxy host: proxy.blabla.com > - Unknown host: www.blabla.com > - Unable to build connection with www.blabla.com:80 > > which would indicate the real reason for the failure. After such a > failure, htdig will no longer attempt to fetch files from that server, > and will give the "no server running" error for each document. > > > [deleted] > > > > > What do these lines mean? > > Do they mean that htdig can't index file kom,close,uk.htm because > > it fails to cotact the server? > > What factors can make it fails to contact the server? > > It's usually because of a DNS name lookup error or an inability to open > a connection with the web server (e.g. host not up and running, or host > not accepting connections on port 80). -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

