Sebastian Nagel created NUTCH-2946:
--------------------------------------
Summary: Fetcher: optionally slow down fetching from hosts with
repeated exceptions
Key: NUTCH-2946
URL: https://issues.apache.org/jira/browse/NUTCH-2946
Project: Nutch
Issue Type: Improvement
Components: fetcher
Affects Versions: 1.18
Reporter: Sebastian Nagel
Assignee: Sebastian Nagel
Fix For: 1.19
The fetcher holds for every fetch queue a counter which counts the number of
observed "exceptions" seen when fetching from the host (resp. domain or IP)
bound to this queue.
As an improvement to increase the politeness of the crawler, the counter value
could be used to dynamically increase the fetch delay for hosts where requests
fail repeatedly with exceptions or HTTP status codes mapped to
ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server
errors, etc.) Of course, this should be optional. The aim to reduce the load on
such hosts already before the configured max. number of exceptions (property
fetcher.max.exceptions.per.queue) is hit.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)