Daniele Menozzi wrote:
On  10:27:55 28/Sep , AJ Chen wrote:

I started the crawler with about 2000 sites. The fetcher could achieve 7 pages/sec initially, but the performance gradually dropped to about 2 pages/sec, sometimes even 0.5 pages/sec. The fetch list had 300k pages and I used 500 threads. What are the main causes of this slowing down?


I have the same problem; I've tried with different number of fetchers
(10,20,50,100,..), but the download rate always decrease sistematically,
page after page.

I suspect threads are hanging, probably in the parser, but sometimes TCP connections get stuck too. Use 'kill -QUIT' to generate a stack dump for all threads. Or use 'lsof' to see open TCP connections. We should probably modify the Fetcher to, when a thread takes more than a certain amount of time to process an individual request, terminate that thread.

Doug

Reply via email to