Hi,

Protocol-httpclient sets the maximum number of total connections to
"fetcher.threads.fetch" configuration parameter for underlying
commons-httpclient. However, if -threads argument is used with the fetcher it
doesn't change fetcher.threads.fetch. Giving whatever number of threads to
-threads argument, httpclient will use default value of number of total
connections (10). This will affect the performance of crawling. It seems to
be a bug. Any comment on this?

Possible solution can be adding below line to setThreadCount function of
Fetcher class.
 NutchConf.get().setInt("fetcher.threads.fetch", threadCount);

Also, fetcher seems to be using lots of memory; maybe due to memory leak. It
starts with %10~%15; after several hours Linux top command reports it's using
%50~%70 of the whole memory. Anyone experiencing this behaviour?

Thanks,
-orkunt.

Reply via email to