[ https://issues.apache.org/jira/browse/NUTCH-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531236#comment-17531236 ]
ASF GitHub Bot commented on NUTCH-2946: --------------------------------------- sebastian-nagel opened a new pull request, #728: URL: https://github.com/apache/nutch/pull/728 Fetcher to slow down fetching from hosts where requests fail repeatedly with exceptions or HTTP status codes mapped to ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server errors, etc.) > Fetcher: optionally slow down fetching from hosts with repeated exceptions > -------------------------------------------------------------------------- > > Key: NUTCH-2946 > URL: https://issues.apache.org/jira/browse/NUTCH-2946 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.19 > > > The fetcher holds for every fetch queue a counter which counts the number of > observed "exceptions" seen when fetching from the host (resp. domain or IP) > bound to this queue. > As an improvement to increase the politeness of the crawler, the counter > value could be used to dynamically increase the fetch delay for hosts where > requests fail repeatedly with exceptions or HTTP status codes mapped to > ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx > server errors, etc.) Of course, this should be optional. The aim to reduce > the load on such hosts already before the configured max. number of > exceptions (property fetcher.max.exceptions.per.queue) is hit. -- This message was sent by Atlassian Jira (v8.20.7#820007)