Doug, Thanks for reply, I'll try to perform specific tests against in-home Apache during this week(end) (limited in time slightly... Sorry!). Everything possible, usually Apache httpd has "timeout" setting for keep-alive, and default setting is (I don't remember) probably 600 seconds. I performed such tests a while ago using Grinder... Also, I can create baseline Nutch vs Grinder, Nutch vs Teleport. Grinder can save HTTP reply in log files (this is mostly load-generation tool); Teleport is commercial Web Grabber... For HTTP/1.0 we should send explicit message "Connection: close" before Socket.close()... But we need to perform real tests anyway.
-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, October 03, 2005 1:05 PM To: [email protected] Subject: Re: what contibute to fetch slowing down Fuad Efendi wrote: > If I am right, we are simply _killing_ many many sites with default > Apache HTTPD installation (Microsoft IIS, etc.) (150 keep-alive client > threads; I configured 6000 threads for Worker model, but it was very > unusual). Those client threads are created each time for each single > HTTP request from Nutch, after 150 pages we are simply overloading Web > Server, and we receive "connection timeout exception". I would be surprised if a web server, on exhausting it's keep-alive cache, wouldn't simply close some of them. Repeated connections without keep-alive should not harm a web server, as long as they're polite. > We need to use real Web Server during tests, and HTTP Proxy > (http://grinder.sourceforge.net - very simple Java based proxy) That would be a great contribution. Do you have time to work on this? Doug
