Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by robotgenius:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  If you are using a slow internet connection (ie- DSL), you might be limited 
to 40 or fewer concurrent fetches.
  
- If you have a fast internet connection (> 10Gb/sec) your bottleneck will 
definitely be in the machine itself (in fact you will need multiple machines to 
saturate the data pipe).  Empirically I have found that the machine works well 
up to about 1000-1500 threads.  
+ If you have a fast internet connection (> 10Mb/sec) your bottleneck will 
definitely be in the machine itself (in fact you will need multiple machines to 
saturate the data pipe).  Empirically I have found that the machine works well 
up to about 1000-1500 threads.  
  
  To get this to work on my Linux box I needed to set the ulimit to 65535 
(ulimit -n 65535), and I had to make sure that the DNS server could handle the 
load (we had to speak with our colo to get them to shut off an artifical cap on 
the DNS servers).  Also, in order to get the speed up to a reasonable value, we 
needed to set the maximum fetches per host to 100 (otherwise we get a quick 
start followed by a very long slow tail of fetching).
  

Reply via email to