[ https://issues.apache.org/jira/browse/NUTCH-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531217#comment-17531217 ]
Markus Jelsma commented on NUTCH-2946: -------------------------------------- Sounds good! If you'd prefer this to be optional, i would prefer it to be enabled by default. > Fetcher: optionally slow down fetching from hosts with repeated exceptions > -------------------------------------------------------------------------- > > Key: NUTCH-2946 > URL: https://issues.apache.org/jira/browse/NUTCH-2946 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.19 > > > The fetcher holds for every fetch queue a counter which counts the number of > observed "exceptions" seen when fetching from the host (resp. domain or IP) > bound to this queue. > As an improvement to increase the politeness of the crawler, the counter > value could be used to dynamically increase the fetch delay for hosts where > requests fail repeatedly with exceptions or HTTP status codes mapped to > ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx > server errors, etc.) Of course, this should be optional. The aim to reduce > the load on such hosts already before the configured max. number of > exceptions (property fetcher.max.exceptions.per.queue) is hit. -- This message was sent by Atlassian Jira (v8.20.7#820007)