[
https://issues.apache.org/jira/browse/NUTCH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117234#comment-13117234
]
Lewis John McGibbney commented on NUTCH-629:
--------------------------------------------
What is the situation with this issue? There is no explanation why it was not
included in previous Nutch releases, however as Julien suggests, the issue of
the fetcher being slowed/bogged down by slow server responses has been somewhat
(fully?) addressed by subsequent Jira issues which have now been resolved,
fixed and closed.
> Detect slow and timeout servers and drop their URLs
> ---------------------------------------------------
>
> Key: NUTCH-629
> URL: https://issues.apache.org/jira/browse/NUTCH-629
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Otis Gospodnetic
> Assignee: Otis Gospodnetic
> Attachments: NUTCH-629.patch
>
>
> Fetch jobs will finish faster if we find a way to prevent servers that are
> either slow or time out from slowing down the whole process.
> I'll attach a patch that counts per-server exceptions and timeouts and tracks
> download speed per server.
> Queues/sservers that exceed timeout or download thresholds are marked as
> "tooManyErrors" or "tooSlow". Once they get marked as such, all of their
> subsequent URLs get dropped (i.e. they do not fetched) and marked GONE.
> At the end of the fetch task, stats for each server processed are printed.
> Also, I believe the per-host/domain/TLD/etc. DB from NUTCH-628 would be the
> right place to add server data collected by this patch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira