Fuad Efendi wrote:
If I am right, we are simply _killing_ many many sites with default Apache HTTPD installation (Microsoft IIS, etc.) (150 keep-alive client threads; I configured 6000 threads for Worker model, but it was very unusual). Those client threads are created each time for each single HTTP request from Nutch, after 150 pages we are simply overloading Web Server, and we receive "connection timeout exception".
I would be surprised if a web server, on exhausting it's keep-alive cache, wouldn't simply close some of them. Repeated connections without keep-alive should not harm a web server, as long as they're polite.
We need to use real Web Server during tests, and HTTP Proxy (http://grinder.sourceforge.net - very simple Java based proxy)
That would be a great contribution. Do you have time to work on this? Doug
