[ https://issues.apache.org/jira/browse/NUTCH-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004264#comment-18004264 ]
ASF GitHub Bot commented on NUTCH-3114: --------------------------------------- sebastian-nagel opened a new pull request, #853: URL: https://github.com/apache/nutch/pull/853 Modify the default value of the configuration property `fetcher.max.exceptions.per.queue` from `-1` ("unlimited") to `5`, so that blocked queues are purged earlier. > Avoid stale fetching when only URLs from queues blocked by the exponential > backoff remain > ------------------------------------------------------------------------------------------ > > Key: NUTCH-3114 > URL: https://issues.apache.org/jira/browse/NUTCH-3114 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.19 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.21 > > > The exponential backoff (NUTCH-2946) politely slows down fetching from queues > where requests fail repeatedly with exceptions or HTTP status codes (503, > 403, 429, etc.) mapped to the protocol status "EXCEPTION". > However, because the delay grows exponentially. Starting with the default > fetch delay of 5 seconds, after the 8th exception the fetcher waits for five > minutes. If all "good" queues are exhausted and there is no time limit > ({{fetcher.timelimit.mins}}) or minimum throughput > ({{fetcher.throughput.threshold.pages}}) configured, this may cause the > fetching becomes stale and is finally stopped by the task timeout. > The default for {{fetcher.max.exceptions.per.queue}} should be set to a > reasonable low value, so that queues where requests fail repeatedly with > exceptions are purged. With the current default of {{-1}} queues are never > purged. -- This message was sent by Atlassian Jira (v8.20.10#820010)