Fetcher to skip queues for URLS getting repeated exceptions, based on percentage
--------------------------------------------------------------------------------

                 Key: NUTCH-1303
                 URL: https://issues.apache.org/jira/browse/NUTCH-1303
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.4
            Reporter: behnam nikbakht


as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good 
solution to skip queues with high exception value, but it is not easy to set 
value of fetcher.max.exceptions.per.queue when size of queues are different.
i suggest that define a ratio instead of value, so if the ratio of exceptions 
per requests exceeds, then queue cleared.
also, it is not sufficient to keep fetcher from high exceptions, value of 
fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch 
can gained against slow hosts, but it clean all queues not slow queue. i 
suggest for this one that this factor like fetcher.max.exceptions.per.queue 
enforce to each queue not all of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to