Hi all,

In Http 3.1, the Nutch code base would configure timeouts using the following snippet of code:

    MultiThreadedHttpConnectionManager connectionManager =
          new MultiThreadedHttpConnectionManager();

    HttpClient client = new HttpClient(connectionManager);

    HttpConnectionManagerParams params = connectionManager.getParams();
    params.setConnectionTimeout(timeout);
    params.setSoTimeout(timeout);

// executeMethod(HttpMethod) seems to ignore the connection timeout on the connection manager.
    // set it explicitly on the HttpClient.

    client.getParams().setConnectionManagerTimeout(timeout);

What's the functional equivalent in 4.0? I'm assuming that:

    HttpParams params = new BasicHttpParams();
    ConnManagerParams.setTimeout(params, timeout);

is equivalent to the 3.1 call to params.setConnectionTimeout(timeout). But what about the setSoTimeout() call?

One reason I'm asking is that I ran into a very long timeout while trying to fetch a page. The wire log looked like:

09/05/04 16:03:39 DEBUG client.DefaultRequestDirector:408 - Attempt 1 to execute request 09/05/04 16:03:39 DEBUG http.wire:78 - >> "GET /noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/ HTTP/1.1[EOL]" 09/05/04 16:03:39 DEBUG http.wire:78 - >> "Host: telenovelas.censuratv.net[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "Connection: Keep-Alive[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "User-Agent: bixo[EOL]"
09/05/04 16:03:39 DEBUG http.wire:78 - >> "[EOL]"
09/05/04 16:03:39 DEBUG http.headers:251 - >> GET /noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/ HTTP/1.1
09/05/04 16:03:39 DEBUG http.headers:254 - >> Host: telenovelas.censuratv.net
09/05/04 16:03:39 DEBUG http.headers:254 - >> Connection: Keep-Alive
09/05/04 16:03:39 DEBUG http.headers:254 - >> User-Agent: bixo
09/05/04 16:13:32 DEBUG conn.DefaultClientConnection:160 - Connection closed
09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:414 - Closing the connection.
09/05/04 16:13:32 DEBUG conn.DefaultClientConnection:160 - Connection closed
09/05/04 16:13:32 INFO client.DefaultRequestDirector:418 - I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond 09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:423 - The target server failed to respond
09/05/04 16:13:32 INFO client.DefaultRequestDirector:425 - Retrying request
09/05/04 16:13:32 DEBUG client.DefaultRequestDirector:433 - Reopening the direct connection.
09/05/04 16:14:24 DEBUG conn.DefaultClientConnection:147 - Connection shut down
09/05/04 16:14:24 DEBUG tsccm.ThreadSafeClientConnManager:223 - Released connection is not reusable. 09/05/04 16:14:24 DEBUG tsccm.ConnPoolByRoute:374 - Releasing connection [HttpRoute[{}->http://telenovelas.censuratv.net]][null] 09/05/04 16:14:24 DEBUG tsccm.ConnPoolByRoute:631 - Notifying no-one, there are no waiting threads 09/05/04 16:14:24 DEBUG http.HttpClientFetcher:267 - Exception while fetching url http://telenovelas.censuratv.net/noticias/elrostrodeanaliacanal9-argentina-elrostrodeanalia-canal9/
java.net.UnknownHostException: telenovelas.censuratv.net

So the first try failed after about 10 minutes with an I/O exception (org.apache.http.NoHttpResponseException), then the retry failed much faster (50 seconds) with an java.net.UnknownHostException.

I'm guessing that maybe the real cause of the first long timeout was my DNS system timing out while trying to resolve the invalid server address, and then this "bad hostname" result was cached so that the retry failed faster.

But independent of the above, I'm interested in the best way to prevent all cases of long timeouts, with 4.0.

Thanks much!

-- Ken

PS - I could do my own DNS resolver that maps hostnames to IP addresses, and wrap this with a timer to fail after 10 seconds or so.
--
Ken Krugler
+1 530-210-6378

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to