Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by robotgenius: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ If you are using a slow internet connection (ie- DSL), you might be limited to 40 or fewer concurrent fetches. - If you have a fast internet connection (> 10Gb/sec) your bottleneck will definitely be in the machine itself (in fact you will need multiple machines to saturate the data pipe). Empirically I have found that the machine works well up to about 1000-1500 threads. + If you have a fast internet connection (> 10Mb/sec) your bottleneck will definitely be in the machine itself (in fact you will need multiple machines to saturate the data pipe). Empirically I have found that the machine works well up to about 1000-1500 threads. To get this to work on my Linux box I needed to set the ulimit to 65535 (ulimit -n 65535), and I had to make sure that the DNS server could handle the load (we had to speak with our colo to get them to shut off an artifical cap on the DNS servers). Also, in order to get the speed up to a reasonable value, we needed to set the maximum fetches per host to 100 (otherwise we get a quick start followed by a very long slow tail of fetching).