Hi. I've been trying to get the nutch fetcher to work, but it always hangs on one of the reduce process, and job is failed. I am using 160 map tasks and 16 reduce tasks during fetch on a 8 machine cluster. fetch task only fetching, don't parse content.
Everything works fine but one reduce out of N fails in the last step. I fail to understand whats going on. Why would reduce job fail when simple identity reduce process? I changed OS from Linux to FreeBSD one month ago. Is this OS problem? I'd appreciate if any one can share their experience. Environment: FreeBSD 6.2-RELEASE 64-bit java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build diablo-1.5.0-b01) Java HotSpot(TM) 64-Bit Server VM (build diablo-1.5.0_07-b01, mixed mode) Hadoop-0.11.2 Configuration: 160 maps 16 reduce 8 node cluster Jobtracker-webui: Kind % Complete Num Tasks Pending Running Complete Killed Failures map 100.00% 281 0 0 281 0 0 reduce 93.89% 16 0 0 15 1 4 Failed-webui page: Attempt Task Machine Error Logs task_0010_r_000000_0 tip_0010_r_000000 task_node2 Task failed to report status for 603 seconds. Killing. task_0010_r_000000_1 tip_0010_r_000000 task_node7 Task failed to report status for 609 seconds. Killing. task_0010_r_000000_2 tip_0010_r_000000 task_node5 Task failed to report status for 601 seconds. Killing. task_0010_r_000000_3 tip_0010_r_000000 task_node4 Task failed to report status for 602 seconds. Killing. Jobtracker-log: mapred.TaskInProgress - Error from task_0010_r_000000_0: Task failed to report status for 603 seconds. Killing. mapred.TaskInProgress - Task 'task_0010_r_000000_0' has been lost. mapred.JobTracker - Removed completed task 'task_0010_r_000000_0' from 'tracker_task_node2:50050' Tasktracker-log: Task failed to report status for 603 seconds. Killing. Process Thread Dump: lost task 17 active threads Thread 13014 (IPC Client connection to job_node/10.8.50.31:9001): State: WAITING Blocked count: 1 Waited count: 1 Waiting on [EMAIL PROTECTED] Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:474) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:209) org.apache.hadoop.ipc.Client$Connection.run(Client.java:248) Thread 12538 (Thread-11192): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.io.FileInputStream.readBytes(Native Method) java.io.FileInputStream.read(FileInputStream.java:194) org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java:363) org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java:33) org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:326) Thread 12537 (process reaper): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.lang.UNIXProcess.waitForProcessExit(Native Method) java.lang.UNIXProcess.access$900(UNIXProcess.java:20) java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) Thread 10924 (SocketListener0-9): State: TIMED_WAITING Blocked count: 2 Waited count: 734 Stack: java.lang.Object.wait(Native Method) org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:522) Thread 5608 (Thread-4636): State: RUNNABLE Blocked count: 763 Waited count: 4356 Stack: java.io.FileInputStream.readBytes(Native Method) java.io.FileInputStream.read(FileInputStream.java:194) java.io.BufferedInputStream.fill(BufferedInputStream.java:218) java.io.BufferedInputStream.read1(BufferedInputStream.java:256) java.io.BufferedInputStream.read(BufferedInputStream.java:313) org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java:363) org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:330) org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:220) Thread 20 ([EMAIL PROTECTED]): State: TIMED_WAITING Blocked count: 0 Waited count: 0 Stack: java.lang.Thread.sleep(Native Method) org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:465) java.lang.Thread.run(Thread.java:595) Thread 16 (org.apache.hadoop.io.ObjectWritable Connection Culler): State: TIMED_WAITING Blocked count: 14 Waited count: 0 Stack: java.lang.Thread.sleep(Native Method) org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:397) Thread 15 (IPC Server handler 1 on 50050): State: TIMED_WAITING Blocked count: 1132 Waited count: 50998 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Server$Handler.run(Server.java:510) Thread 14 (IPC Server handler 0 on 50050): State: BLOCKED Blocked count: 1173 Waited count: 50996 Blocked on [EMAIL PROTECTED] Blocked by 1 (main) Stack: org.apache.hadoop.mapred.TaskTracker.ping(TaskTracker.java:1261) sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:585) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:337) org.apache.hadoop.ipc.Server$Handler.run(Server.java:538) Thread 13 (IPC Server listener on 50050): State: RUNNABLE Blocked count: 31 Waited count: 0 Stack: sun.nio.ch.PollArrayWrapper.poll0(Native Method) sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100) sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) org.apache.hadoop.ipc.Server$Listener.run(Server.java:230) Thread 11 (Acceptor ServerSocket[addr= 0.0.0.0/0.0.0.0,port=0,localport=50060]): State: RUNNABLE Blocked count: 4 Waited count: 0 Stack: java.net.PlainSocketImpl.socketAccept(Native Method) java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) java.net.ServerSocket.implAccept(ServerSocket.java:450) java.net.ServerSocket.accept(ServerSocket.java:421) org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432) org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631) Thread 10 (SessionScavenger): State: TIMED_WAITING Blocked count: 0 Waited count: 0 Stack: java.lang.Thread.sleep(Native Method) org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run( AbstractSessionManager.java:587) Thread 9 (taskCleanup): State: WAITING Blocked count: 3 Waited count: 1024 Waiting on null Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await( AbstractQueuedSynchronizer.java:1767) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java :359) org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:160) java.lang.Thread.run(Thread.java:595) Thread 4 (Signal Dispatcher): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: Thread 3 (Finalizer): State: WAITING Blocked count: 431 Waited count: 1030 Waiting on [EMAIL PROTECTED] Stack: java.lang.Object.wait(Native Method) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Thread 2 (Reference Handler): State: WAITING Blocked count: 909 Waited count: 968 Waiting on [EMAIL PROTECTED] Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:474) java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) Thread 1 (main): State: RUNNABLE Blocked count: 92 Waited count: 13432 Stack: sun.management.ThreadImpl.getThreadInfo0(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:144) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:120) org.apache.hadoop.util.ReflectionUtils.printThreadInfo( ReflectionUtils.java:102) org.apache.hadoop.util.ReflectionUtils.logThreadInfo( ReflectionUtils.java:150) org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks( TaskTracker.java:655) org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:517) org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:857) org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1499) Thanks.
