Fetcher to skip queues for URLS getting repeated exceptions, based on percentage
--------------------------------------------------------------------------------
Key: NUTCH-1303
URL: https://issues.apache.org/jira/browse/NUTCH-1303
Project: Nutch
Issue Type: Improvement
Components: fetcher
Affects Versions: 1.4
Reporter: behnam nikbakht
as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good
solution to skip queues with high exception value, but it is not easy to set
value of fetcher.max.exceptions.per.queue when size of queues are different.
i suggest that define a ratio instead of value, so if the ratio of exceptions
per requests exceeds, then queue cleared.
also, it is not sufficient to keep fetcher from high exceptions, value of
fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch
can gained against slow hosts, but it clean all queues not slow queue. i
suggest for this one that this factor like fetcher.max.exceptions.per.queue
enforce to each queue not all of them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira