Hi all,
In Http 3.1, the Nutch code base would configure timeouts using the
following snippet of code:
MultiThreadedHttpConnectionManager connectionManager =
new MultiThreadedHttpConnectionManager();
HttpClient client = new HttpClient(connectionManager);
HttpConnectionManagerParams params = connectionManager.getParams();
params.setConnectionTimeout(timeout);
params.setSoTimeout(timeout);
// executeMethod(HttpMethod) seems to ignore the connection
timeout on the connection manager.
// set it explicitly on the HttpClient.
client.getParams().setConnectionManagerTimeout(timeout);
What's the functional equivalent in 4.0? I'm assuming that:
HttpParams params = new BasicHttpParams();
ConnManagerParams.setTimeout(params, timeout);
is equivalent to the 3.1 call to
params.setConnectionTimeout(timeout). But what about the
setSoTimeout() call?
One reason I'm asking is that I ran into a very long timeout while
trying to fetch a page. The wire log looked like:
09/05/04 16:03:39 DEBUG client.DefaultRequestDirector:408 - Attempt 1
to execute request
09/05/04 16:03:39 DEBUG http.wire:78 - >> "GET
/noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/
HTTP/1.1[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "Host:
telenovelas.censuratv.net[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "Connection: Keep-Alive[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "User-Agent: bixo[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "[EOL]"
09/05/04 16:03:39 DEBUG http.headers:251 - >> GET
/noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/
HTTP/1.1
09/05/04 16:03:39 DEBUG http.headers:254 - >> Host: telenovelas.censuratv.net
09/05/04 16:03:39 DEBUG http.headers:254 - >> Connection: Keep-Alive
09/05/04 16:03:39 DEBUG http.headers:254 - >> User-Agent: bixo
09/05/04 16:13:32 DEBUG conn.DefaultClientConnection:160 - Connection closed
09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:414 - Closing
the connection.
09/05/04 16:13:32 DEBUG conn.DefaultClientConnection:160 - Connection closed
09/05/04 16:13:32 INFO client.DefaultRequestDirector:418 - I/O
exception (org.apache.http.NoHttpResponseException) caught when
processing request: The target server failed to respond
09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:423 - The
target server failed to respond
09/05/04 16:13:32 INFO client.DefaultRequestDirector:425 - Retrying request
09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:433 - Reopening
the direct connection.
09/05/04 16:14:24 DEBUG conn.DefaultClientConnection:147 - Connection shut down
09/05/04 16:14:24 DEBUG tsccm.ThreadSafeClientConnManager:223 -
Released connection is not reusable.
09/05/04 16:14:24 DEBUG tsccm.ConnPoolByRoute:374 - Releasing
connection [HttpRoute[{}->http://telenovelas.censuratv.net]][null]
09/05/04 16:14:24 DEBUG tsccm.ConnPoolByRoute:631 - Notifying no-one,
there are no waiting threads
09/05/04 16:14:24 DEBUG http.HttpClientFetcher:267 - Exception while
fetching url
http://telenovelas.censuratv.net/noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/
java.net.UnknownHostException: telenovelas.censuratv.net
So the first try failed after about 10 minutes with an I/O exception
(org.apache.http.NoHttpResponseException), then the retry failed much
faster (50 seconds) with an java.net.UnknownHostException.
I'm guessing that maybe the real cause of the first long timeout was
my DNS system timing out while trying to resolve the invalid server
address, and then this "bad hostname" result was cached so that the
retry failed faster.
But independent of the above, I'm interested in the best way to
prevent all cases of long timeouts, with 4.0.
Thanks much!
-- Ken
PS - I could do my own DNS resolver that maps hostnames to IP
addresses, and wrap this with a timer to fail after 10 seconds or so.
--
Ken Krugler
+1 530-210-6378
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]