[
https://issues.apache.org/jira/browse/NUTCH-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086743#comment-18086743
]
ASF GitHub Bot commented on NUTCH-3177:
---------------------------------------
sebastian-nagel commented on PR #915:
URL: https://github.com/apache/nutch/pull/915#issuecomment-4643189463
> I wonder if the `hungThreadsCounter` should be renamed to accommodate idle
threads as well?
Thanks, good point. Or, it should not count the idle threads in. Or even
better, exit earlier if there are only queues, all with delays exceeding the
hard timeout induced by MapReduce. I'll need to think about it. Will open a new
issue.
> Fetcher to report idle threads not as hung threads
> --------------------------------------------------
>
> Key: NUTCH-3177
> URL: https://issues.apache.org/jira/browse/NUTCH-3177
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.22
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.23
>
>
> If there is no URL fetched during half of the MapReduce task timeout, Fetcher
> is shutting down to avoid that the fetcher map task fails because of missing
> progress. Before the shut-down Fetcher reports the remaining FetcherThreads
> as "hung threads" together with the fetched URL. This should allow to debug
> the URLs / pages causing timeouts. For the reporting the field {{reprUrl}} of
> FetcherThread is used. However, the field is not reset after a fetch is done.
> In consequence, the reported URL is not necessarily the one where the fetch
> is in process. It might a the URL that was fetched last, but the thread is
> now idle and waiting for the next fetch item to be ready. This happens if
> there are still fetch queues, but with long delays because of a robots.txt
> Crawl-delay or a longer delay because of the exponential back-off.
> FetcherThread should reset the {{reprUrl}} once a fetch is finished. Idle
> FetcherThread shouldn't be reported as hanging.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)