[ https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel reassigned NUTCH-1182: -------------------------------------- Assignee: Sebastian Nagel > fetcher should track and shut down hung threads > ----------------------------------------------- > > Key: NUTCH-1182 > URL: https://issues.apache.org/jira/browse/NUTCH-1182 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.3, 1.4 > Environment: Linux, local job runner > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 2.4, 1.9 > > Attachments: NUTCH-1182-2x.patch, NUTCH-1182-trunk-v1.patch > > > While crawling a slow server with a couple of very large PDF documents (30 > MB) on it > after some time and a bulk of successfully fetched documents the fetcher stops > with the message: ??Aborting with 10 hung threads.?? > From now on every cycle ends with hung threads, almost no documents are > fetched > successfully. In addition, strange hadoop errors are logged: > {noformat} > fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException > at java.lang.System.arraycopy(Native Method) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108) > ... > {noformat} > or > {noformat} > Exception in thread "QueueFeeder" java.lang.NullPointerException > at > org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214) > {noformat} > I've run the debugger and found: > # after the "hung threads" are reported the fetcher stops but the threads are > still alive and continue fetching a document. In consequence, this will > #* limit the small bandwidth of network/server even more > #* after the document is fetched the thread tries to write the content via > {{output.collect()}} which must fail because the fetcher map job is already > finished and the associated temporary mapred directory is deleted. The error > message may get mixed with the progress output of the next fetch cycle > causing additional confusion. > # documents/URLs causing the hung thread are never reported nor stored. That > is, it's hard to track them down, and they will cause a hung thread again and > again. > The problem is reproducible when fetching bigger documents and setting > {{mapred.task.timeout}} to a low value (this will definitely cause hung > threads). -- This message was sent by Atlassian JIRA (v6.2#6252)