Nutch, by default, delays five seconds between successive requests to
the same server, but can be overridden easily. Nutch also obeys the
robots exclusion standard but can be configured to listen to a different
identifier than Nutch.
The best option is to contact the host or ISP if trouble continues.
On Sun, 29 Apr 2012 06:29:35 -0700, Jerry Durand
<[email protected]> wrote:
If you're going to give out web scrapers, PLEASE put a delay between
file downloads. Nutch was locking up our system, almost a DOS
attack.
Hopefully Nutch obeys the robots.txt file.