[jira] [Commented] (NUTCH-1182) fetcher should track and shut down hung threads

Markus Jelsma (Commented) (JIRA) Thu, 27 Oct 2011 05:43:00 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137005#comment-13137005
 ]


Markus Jelsma commented on NUTCH-1182:
--------------------------------------

The problem is that Fetcher times out based on when the most recent request has 
started too long ago. 

{code}
if ((System.currentTimeMillis() - lastRequestStart.get()) > timeout) {
{code}

To allow a lot of very slow/large fetches to complete (without having smaller 
fetches in another thread to update lastRequestStart) we would need to change 
this mechanism to allow for a status such as lastRequestStart to be updated 
from within the blocking fetch.
                
> fetcher should track and shut down hung threads
> -----------------------------------------------
>
>                 Key: NUTCH-1182
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1182
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3, 1.4
>         Environment: Linux, local job runner
>            Reporter: Sebastian Nagel
>            Priority: Minor
>
> While crawling a slow server with a couple of very large PDF documents (30 
> MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are 
> fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>    fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
>     at java.lang.System.arraycopy(Native Method)
>     at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
>     ...
> {noformat}
> or
> {noformat}
>    Exception in thread "QueueFeeder" java.lang.NullPointerException
>          at 
> org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>          at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>          at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are 
> still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via 
> {{output.collect()}} which must fail because the fetcher map job is already 
> finished and the associated temporary mapred directory is deleted. The error 
> message may get mixed with the progress output of the next fetch cycle 
> causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That 
> is, it's hard to track them down, and they will cause a hung thread again and 
> again.
> The problem is reproducible when fetching bigger documents and setting 
> {{mapred.task.timeout}} to a low value (this will definitely cause hung 
> threads).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1182) fetcher should track and shut down hung threads

Reply via email to