Nutch, by default, delays five seconds between successive requests to the same server, but can be overridden easily. Nutch also obeys the robots exclusion standard but can be configured to listen to a different identifier than Nutch.

The best option is to contact the host or ISP if trouble continues.

On Sun, 29 Apr 2012 06:29:35 -0700, Jerry Durand <[email protected]> wrote:
If you're going to give out web scrapers, PLEASE put a delay between
file downloads. Nutch was locking up our system, almost a DOS attack.
Hopefully Nutch obeys the robots.txt file.

Reply via email to