[
https://issues.apache.org/jira/browse/NUTCH-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004271#comment-18004271
]
ASF GitHub Bot commented on NUTCH-3114:
---------------------------------------
sebastian-nagel merged PR #853:
URL: https://github.com/apache/nutch/pull/853
> Avoid stale fetching when only URLs from queues blocked by the exponential
> backoff remain
> ------------------------------------------------------------------------------------------
>
> Key: NUTCH-3114
> URL: https://issues.apache.org/jira/browse/NUTCH-3114
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.19
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Major
> Fix For: 1.21
>
>
> The exponential backoff (NUTCH-2946) politely slows down fetching from queues
> where requests fail repeatedly with exceptions or HTTP status codes (503,
> 403, 429, etc.) mapped to the protocol status "EXCEPTION".
> However, because the delay grows exponentially. Starting with the default
> fetch delay of 5 seconds, after the 8th exception the fetcher waits for five
> minutes. If all "good" queues are exhausted and there is no time limit
> ({{fetcher.timelimit.mins}}) or minimum throughput
> ({{fetcher.throughput.threshold.pages}}) configured, this may cause the
> fetching becomes stale and is finally stopped by the task timeout.
> The default for {{fetcher.max.exceptions.per.queue}} should be set to a
> reasonable low value, so that queues where requests fail repeatedly with
> exceptions are purged. With the current default of {{-1}} queues are never
> purged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)