Sebastian Nagel created NUTCH-3114: -------------------------------------- Summary: Avoid stale fetching when only URLs from queues blocked by the exponential backoff remain Key: NUTCH-3114 URL: https://issues.apache.org/jira/browse/NUTCH-3114 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.19 Reporter: Sebastian Nagel Assignee: Sebastian Nagel Fix For: 1.21
The exponential backoff (NUTCH-2946) politely slows down fetching from queues where requests fail repeatedly with exceptions or HTTP status codes (503, 403, 429, etc.) mapped to the protocol status "EXCEPTION". However, because the delay grows exponentially. Starting with the default fetch delay of 5 seconds, after the 8th exception the fetcher waits for five minutes. If all "good" queues are exhausted and there is no time limit ({{fetcher.timelimit.mins}}) or minimum throughput ({{fetcher.throughput.threshold.pages}}) configured, this may cause the fetching becomes stale and is finally stopped by the task timeout. The default for {{fetcher.max.exceptions.per.queue}} should be set to a reasonable low value, so that queues where requests fail repeatedly with exceptions are purged. With the current default of {{-1}} queues are never purged. -- This message was sent by Atlassian Jira (v8.20.10#820010)