[ https://issues.apache.org/jira/browse/NUTCH-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881818#comment-17881818 ]
ASF GitHub Bot commented on NUTCH-3058: --------------------------------------- lewismc commented on code in PR #820: URL: https://github.com/apache/nutch/pull/820#discussion_r1759821839 ########## src/java/org/apache/nutch/fetcher/Fetcher.java: ########## @@ -419,27 +419,43 @@ else if (bandwidthTargetCheckCounter == bandwidthTargetCheckEveryNSecs) { .increment(hitByTimeLimit); } - // some requests seem to hang, despite all intentions + /* + * Some requests seem to hang, with no fetches finished and no new + * fetches started during half of the MapReduce task timeout + * (mapreduce.task.timeout, default value: 15 minutes). In order to + * avoid that the task timeout is hit and the fetcher job is failed, + * we stop the fetching now. + */ if ((System.currentTimeMillis() - lastRequestStart.get()) > timeout) { - if (LOG.isWarnEnabled()) { - LOG.warn("Aborting with {} hung threads.", activeThreads); - for (int i = 0; i < fetcherThreads.size(); i++) { - FetcherThread thread = fetcherThreads.get(i); - if (thread.isAlive()) { - LOG.warn("Thread #{} hung while processing {}", i, - thread.getReprUrl()); - if (LOG.isDebugEnabled()) { - StackTraceElement[] stack = thread.getStackTrace(); - StringBuilder sb = new StringBuilder(); - sb.append("Stack of thread #").append(i).append(":\n"); - for (StackTraceElement s : stack) { - sb.append(s.toString()).append('\n'); - } - LOG.debug(sb.toString()); + LOG.warn("Aborting with {} hung threads.", activeThreads); + innerContext.getCounter("FetcherStatus", "hungThreads") + .increment(activeThreads.get()); + for (int i = 0; i < fetcherThreads.size(); i++) { + FetcherThread thread = fetcherThreads.get(i); + if (thread.isAlive()) { + LOG.warn("Thread #{} hung while processing {}", i, Review Comment: Yes I think WARN logging here is called for. That sounds like a better solution. Thanks. > Fetcher: counter for hung threads > --------------------------------- > > Key: NUTCH-3058 > URL: https://issues.apache.org/jira/browse/NUTCH-3058 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.20 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.21 > > > The Fetcher class defines a "hard" timeout defined as 50% of the MapReduce > task timeout, see {{mapreduce.task.timeout}} and > {{fetcher.threads.timeout.divisor}}. If there are fetcher threads running but > without any progress during the timeout period (in terms of newly started > fetch items), Fetcher is shut down to avoid that the task timeout is reached > and the fetcher job is failed. The "hung threads" are logged together with > the URL being fetched and (DEBUG level) the Java stack. > In addition to logging, a job counter should indicate the number of hung > threads. This would allow to see on the job level whether there are issues > with hung threads. To trace the issues it's still required to look into the > Hadoop task logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)