http://nagoya.apache.org/jira - it does not work right now, I am trying to upload new Http-Plugin which seems to be 100 times faster.
1. TCP connection costs a lot, not only for Nutch and end-point but also for intermediary network equipment 2. Web Server creates Client thread and hopes that Nutch really uses HTTP/1.1, or at least Nutch sends "Connection: close" before closing in JVM "Socket.close()" ... I need to perform very objective tests, probably 2-3 days; new plugin crawled/parsed 23,000 pages for 1,321 seconds; it seems that existing http-plugin needs few days... I am using separate network segment with Windows XP (Nutch), and Suse Linux (Apache HTTPD + 120,000 pages) -----Original Message----- From: Daniele Menozzi [mailto:[EMAIL PROTECTED] Sent: Monday, October 10, 2005 5:42 PM To: [email protected] Subject: Re: Re[2]: what contibute to fetch slowing down On 03:36:45 03/Oct , Michael wrote: > 3mbit, 100 threads = 15 pages/sec > cpu is low during fetch, so its bandwidth limit. yes, cpu is low, and even memory is quite free. But, with a 10MB in/out I cannot obtain good results (and I do not parse results, simply fetch them). If I use 100 threads, I can download pages at 500KB/s for about 5 seconds, but after that, the download rate falls to 0. If I set 20 threads, I can download at 200KB for 4/5 minutes, and the rate initially seems very stable. But, after theese few minutes, the rate starts to get lower and lower, and tends to reach zero pages/s. I cannot understand what could be the problem. Every thread number I choose, the rate _always_ decrease, till it has reached 1/2 pages/s. I;ve tried 2 different machines, but the problem is always the same. Can you please give me some advices? Thank you Daniele -- Free Software Enthusiast Debian Powered Linux User #332564 http://menoz.homelinux.org
