[
https://issues.apache.org/jira/browse/NUTCH-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531245#comment-17531245
]
Sebastian Nagel commented on NUTCH-2946:
----------------------------------------
> If you'd prefer this to be optional, i would prefer it to be enabled by
> default.
Agreed. What about using the value of fetcher.server.delay as the default
value? This would mean the delay is 2 * fetcher.server.delay after the first
error and doubles every time the number of errors doubles. But we could use any
other value. However, I would also update the PR so that
fetcher.exceptions.per.queue.delay is configured in seconds as a float (instead
of milliseconds).
> Fetcher: optionally slow down fetching from hosts with repeated exceptions
> --------------------------------------------------------------------------
>
> Key: NUTCH-2946
> URL: https://issues.apache.org/jira/browse/NUTCH-2946
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.18
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.19
>
>
> The fetcher holds for every fetch queue a counter which counts the number of
> observed "exceptions" seen when fetching from the host (resp. domain or IP)
> bound to this queue.
> As an improvement to increase the politeness of the crawler, the counter
> value could be used to dynamically increase the fetch delay for hosts where
> requests fail repeatedly with exceptions or HTTP status codes mapped to
> ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx
> server errors, etc.) Of course, this should be optional. The aim to reduce
> the load on such hosts already before the configured max. number of
> exceptions (property fetcher.max.exceptions.per.queue) is hit.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)