http://nagoya.apache.org/jira
- it does not work right now, I am trying to upload new Http-Plugin which
seems to be 100 times faster.

1. TCP connection costs a lot, not only for Nutch and end-point but also for
intermediary network equipment
2. Web Server creates Client thread and hopes that Nutch really uses
HTTP/1.1, or at least Nutch sends "Connection: close" before closing in JVM
"Socket.close()"
...

I need to perform very objective tests, probably 2-3 days; new plugin
crawled/parsed 23,000 pages for 1,321 seconds; it seems that existing
http-plugin needs few days...

I am using separate network segment with Windows XP (Nutch), and Suse Linux
(Apache HTTPD + 120,000 pages)



-----Original Message-----
From: Daniele Menozzi [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 10, 2005 5:42 PM
To: [email protected]
Subject: Re: Re[2]: what contibute to fetch slowing down


On  03:36:45 03/Oct , Michael wrote:
> 3mbit, 100 threads = 15 pages/sec
> cpu is low during fetch, so its bandwidth limit.

yes, cpu is low, and even memory is quite free. But, with a 10MB in/out I
cannot obtain good results (and I do not parse results, simply fetch them).
If I use 100 threads, I can download pages at 500KB/s for about 5 seconds,
but after that, the download rate falls to 0. If I set 20 threads, I can
download 
at 200KB for 4/5 minutes, and the rate initially seems very stable. But,
after theese few minutes, the rate starts to get lower and lower, and tends
to reach zero pages/s.

I cannot understand what could be the problem. Every thread number I choose,
the rate _always_ decrease, till it has reached 1/2 pages/s. I;ve tried 2
different machines, but the problem is always the same.

Can you please give me some advices?
Thank you
        Daniele



-- 
                      Free Software Enthusiast
                 Debian Powered Linux User #332564 
                     http://menoz.homelinux.org


Reply via email to