Hi, Protocol-httpclient sets the maximum number of total connections to "fetcher.threads.fetch" configuration parameter for underlying commons-httpclient. However, if -threads argument is used with the fetcher it doesn't change fetcher.threads.fetch. Giving whatever number of threads to -threads argument, httpclient will use default value of number of total connections (10). This will affect the performance of crawling. It seems to be a bug. Any comment on this?
Possible solution can be adding below line to setThreadCount function of Fetcher class. NutchConf.get().setInt("fetcher.threads.fetch", threadCount); Also, fetcher seems to be using lots of memory; maybe due to memory leak. It starts with %10~%15; after several hours Linux top command reports it's using %50~%70 of the whole memory. Anyone experiencing this behaviour? Thanks, -orkunt.