Daniele Menozzi wrote:
On 10:27:55 28/Sep , AJ Chen wrote:
I started the crawler with about 2000 sites. The fetcher could achieve
7 pages/sec initially, but the performance gradually dropped to about 2
pages/sec, sometimes even 0.5 pages/sec. The fetch list had 300k pages
and I used 500 threads. What are the main causes of this slowing down?
I have the same problem; I've tried with different number of fetchers
(10,20,50,100,..), but the download rate always decrease sistematically,
page after page.
I suspect threads are hanging, probably in the parser, but sometimes TCP
connections get stuck too. Use 'kill -QUIT' to generate a stack dump
for all threads. Or use 'lsof' to see open TCP connections. We should
probably modify the Fetcher to, when a thread takes more than a certain
amount of time to process an individual request, terminate that thread.
Doug