Well, which failure are we talking about?  I listed 3 below, and the 3rd
one of these can be caused by a number of possible failures.  Generally,
though, if the problem is one of the first two (Unknown host or proxy
host), it can be a configuration problem (you specified the wrong host
name in the start_url or http_proxy attribute), but assuming you got these
right, it's most commonly the result of an overloaded name server or an
intermitant connection between the system running htdig and the system
running the DNS server.  The third error (Unable to build connection)
is most commonly caused by the web server going down while indexing, or
an unreliable network connection between your system and the web server.
By the web server going down, I mean either the whole system going down,
or just the web server processes on that system being shut down.  It may
also be that an overloaded web server starts refusing connections for
a while.

Remember that it only takes one such failure for htdig to stop trying to
contact that host, so it's important that you try to address all these
possible points of failure so that indexing is reliable.  There are a
number of steps you can take to address this.

1) Make sure you have a good, solid connection to a reliable DNS server
that's not too overloaded.  If the DNS server is right on your LAN,
or better still, right on the same system on which you run htdig,
that's ideal.  htdig makes repeated DNS lookups for a host name,
once for each document that's fetched, so you want to use a DNS server
that's "close to home".  Even setting up a caching-only DNS server on
the same system as where you run htdig can be a huge help, and many OS
distributions make it very easy to do that (often just by installing 1
or 2 pre-configured packages).

2) Make sure you have a good, reliable connection to the Internet,
especially with good throughput to and from the hosts you're trying
to index.  If that network connection is overloaded or flakey, htdig is
bound to run into problems.  Again, ideally, the hosts you're indexing
will be right on your LAN, or you'll run htdig right on the web server
(so you can take advantage of the local_urls attribute), but this is
often not possible.

3) Try to index web servers at a time when they don't get a lot of
traffic (but when they're likely to remain up and running), and try not
to overload them yourself with htdig.  You can use the server_wait_time
attribute to make htdig pause between each document you fetch from
the server.  It also helps if the web servers you index are themselves
reliable and good performers.  A reasonably fast PC, with a decent
amount or RAM, running Linux and Apache can handle a LOT of traffic
before it bogs down.  If you're running a proprietary web server on a
proprietary OS on legacy hardware, I'd expect you're more likely to run
into problems where it can't handle the load.  Of course, htdig users
don't always have a say in what web servers they index and how they are
run, but you can try to minimize the load you're adding to the system.

Hope these pointers help.

According to Irene talaway:
> Thanks Gilles, 
> but what can I do to avoid this failure? 
>  Gilles Detillieux <[EMAIL PROTECTED]> wrote: 
> According to Irene talaway:
> [deleted]
> 
> Generally, this message should be preceeded by one of the following:
> 
> - Unknown proxy host: proxy.blabla.com
> - Unknown host: www.blabla.com
> - Unable to build connection with www.blabla.com:80
> 
> which would indicate the real reason for the failure. After such a
> failure, htdig will no longer attempt to fetch files from that server,
> and will give the "no server running" error for each document.
> 
> 
> [deleted]
> 
> > 
> > What do these lines mean?
> > Do they mean that htdig can't index file kom,close,uk.htm because
> > it fails to cotact the server? 
> > What factors can make it fails to contact the server?
> 
> It's usually because of a DNS name lookup error or an inability to open
> a connection with the web server (e.g. host not up and running, or host
> not accepting connections on port 80).


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
The most comprehensive and flexible code editor you can use.
Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
www.slickedit.com/sourceforge
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to