Doug,
Thanks for reply,
I'll try to perform specific tests against in-home Apache during this
week(end) (limited in time slightly... Sorry!). Everything possible,
usually Apache httpd has "timeout" setting for keep-alive, and default
setting is (I don't remember) probably 600 seconds. I performed such
tests a while ago using Grinder... Also, I can create baseline Nutch vs
Grinder, Nutch vs Teleport. Grinder can save HTTP reply in log files
(this is mostly load-generation tool); Teleport is commercial Web
Grabber...
For HTTP/1.0 we should send explicit message "Connection: close" before
Socket.close()... But we need to perform real tests anyway.


-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 03, 2005 1:05 PM
To: [email protected]
Subject: Re: what contibute to fetch slowing down


Fuad Efendi wrote:
> If I am right, we are simply _killing_ many many sites with default 
> Apache HTTPD installation (Microsoft IIS, etc.) (150 keep-alive client

> threads; I configured 6000 threads for Worker model, but it was very 
> unusual). Those client threads are created each time for each single 
> HTTP request from Nutch, after 150 pages we are simply overloading Web

> Server, and we receive "connection timeout exception".

I would be surprised if a web server, on exhausting it's keep-alive 
cache, wouldn't simply close some of them.  Repeated connections without

keep-alive should not harm a web server, as long as they're polite.

> We need to use real Web Server during tests, and HTTP Proxy 
> (http://grinder.sourceforge.net - very simple Java based proxy)

That would be a great contribution.  Do you have time to work on this?

Doug


Reply via email to