Thanks for the valuable input! It was some string concatenation i did in each map call and it got super long. Resolving this made the problem go away!
cheers Johannes On May 7, 2010, at 6:35 PM, Michael Segel wrote: > > There's a couple of things... > > 1) Inside the map() method, if you're taking longer than the timeout, you > will fail. > 2) You could also fail if you run in to GC problems that cause timeout > failures. > > What sometimes happens in long running jobs, you tend to see your GC take > longer as the job runs. > > Do you log a start and end time in each iteration of map()? This will tell > you if the individual time spent in a map() method call is too long. > > Also look at GC. Trying to go through all the logs can really drive one to > drink. :-) > > HTH > > -Mike > > >> Subject: Re: hanging task >> From: [email protected] >> Date: Fri, 7 May 2010 17:59:01 +0200 >> To: [email protected] >> >> Sorry, forgot to say that on map task runs about more then one hour. >> So i do not think its the progress thing. So there are successful map tasks >> and those which are failing failing long after the 10 min timeout. >> And the question is why there is not any stack of the map task visibly in >> the thread dump. >> Is it swallowed or is hadoop stuck right after the mapper is done ? >> >> Johannes >> >> >> On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote: >> >>> Hello Johannes, >>> >>> I had a similar problem and I used the first approach suggested by Joseph >>> i.e. to report the status back. I used progress() method as well as the >>> setStatus() method. progress() method javadoc link is given below. >>> >>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29 >>> >>> This should solve your problem. If you come to know of the reason why it is >>> taking more time than you expected, please mention it here. >>> >>> Regards, >>> Raghava. >>> >>> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <[email protected]> wrote: >>> >>>> You need to either report status or increment a counter from within >>>> your task. In your Java code there is a little trick to help the job >>>> be “aware” within the cluster of tasks that are not dead but just >>>> working hard. During execution of a task there is no built in >>>> reporting that the job is running as expected if it is not writing >>>> out. So this means that if your tasks are taking up a lot of time >>>> doing work it is possible the cluster will see that task as failed >>>> (based on the mapred.task.tracker.expiry.interval setting). >>>> >>>> Have no fear there is a way to tell cluster that your task is doing >>>> just fine. You have 2 ways todo this you can either report the status >>>> or increment a counter. Both of these will cause the task tracker to >>>> properly know the task is ok and this will get seen by the jobtracker >>>> in turn. Both of these options are explained in the JavaDoc >>>> >>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html >>>> >>>> Here are some pointers in general you may find useful >>>> >>>> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/ >>>> >>>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann >>>> <[email protected]> wrote: >>>>> Hi hadoop folks, >>>>> >>>>> i'm encountering following problem on a 4 node cluster running >>>> hadoop-0.20.2. >>>>> >>>>> Its a map only job reading about 9 GB data from outside of hadoop. 31 map >>>> tasks at all while 12 map tasks running at a time. >>>>> The first wave of mappers finishes successfully. >>>>> Later on the first tasks are failing and they do that shortly before >>>> finishing: >>>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for 602 >>>> seconds. Killing! >>>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for 600 >>>> seconds. Killing! >>>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for 602 >>>> seconds. Killing! >>>>> >>>>> The unusual is that i do not find any signs of the job code in the thread >>>> dump the tasktracker takes automatically: >>>>> ---------------------------------------------------------------- >>>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker: >>>> attempt_201004301437_0050_m_000001_0: Task >>>> attempt_201004301437_0050_m_000001_0 failed to report status for 601 >>>> seconds. Killing! >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: >>>> Process Thread Dump: lost task >>>>> 34 active threads >>>>> Thread 29555 (process reaper): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495 spawned.): >>>>> State: WAITING >>>>> Blocked count: 1 >>>>> Waited count: 2 >>>>> Waiting on java.lang.unixproc...@34b56f69 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>> >>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>> Thread 29550 (Thread-15193): >>>>> State: WAITING >>>>> Blocked count: 13 >>>>> Waited count: 14 >>>>> Waiting on java.lang.obj...@72c09161 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>> Thread 29501 (process reaper): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.): >>>>> State: WAITING >>>>> Blocked count: 1 >>>>> Waited count: 2 >>>>> Waiting on java.lang.unixproc...@56175cd4 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>> >>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>> Thread 29496 (Thread-15162): >>>>> State: WAITING >>>>> Blocked count: 13 >>>>> Waited count: 14 >>>>> Waiting on java.lang.obj...@3b916de2 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>> Thread 29225 (process reaper): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314 spawned.): >>>>> State: WAITING >>>>> Blocked count: 1 >>>>> Waited count: 2 >>>>> Waiting on java.lang.unixproc...@39874d3b >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>> >>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>> Thread 29217 (process reaper): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636 spawned.): >>>>> State: WAITING >>>>> Blocked count: 1 >>>>> Waited count: 2 >>>>> Waiting on java.lang.unixproc...@2c39220f >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>> >>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>> >>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>> Thread 29208 (Thread-15014): >>>>> State: WAITING >>>>> Blocked count: 18 >>>>> Waited count: 14 >>>>> Waiting on java.lang.obj...@7ddbbae2 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>> Thread 29203 (Thread-15011): >>>>> State: WAITING >>>>> Blocked count: 13 >>>>> Waited count: 14 >>>>> Waiting on java.lang.obj...@2dac3f6f >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from >>>> hadoop): >>>>> State: TIMED_WAITING >>>>> Blocked count: 10075 >>>>> Waited count: 10075 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403) >>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:445) >>>>> Thread 33 (Directory/File cleanup thread): >>>>> State: WAITING >>>>> Blocked count: 2 >>>>> Waited count: 615 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@4d6c68b3 >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> >>>> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89) >>>>> Thread 9 (taskCleanup): >>>>> State: WAITING >>>>> Blocked count: 30 >>>>> Waited count: 92 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@3298407f >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317) >>>>> java.lang.Thread.run(Thread.java:619) >>>>> Thread 32 (TaskLauncher for task): >>>>> State: WAITING >>>>> Blocked count: 127 >>>>> Waited count: 123 >>>>> Waiting on java.util.linkedl...@c33377 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> >>>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601) >>>>> Thread 31 (TaskLauncher for task): >>>>> State: WAITING >>>>> Blocked count: 3237 >>>>> Waited count: 3147 >>>>> Waiting on java.util.linkedl...@67001629 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> >>>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601) >>>>> Thread 30 (Map-events fetcher for all reduce tasks on >>>> tracker_node1t:localhost/127.0.0.1:48722): >>>>> State: WAITING >>>>> Blocked count: 712 >>>>> Waited count: 804 >>>>> Waiting on java.util.tree...@1fec8cf1 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> >>>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606) >>>>> Thread 28 (IPC Server handler 7 on 48722): >>>>> State: WAITING >>>>> Blocked count: 10 >>>>> Waited count: 3199 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 27 (IPC Server handler 6 on 48722): >>>>> State: WAITING >>>>> Blocked count: 265 >>>>> Waited count: 3207 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 26 (IPC Server handler 5 on 48722): >>>>> State: WAITING >>>>> Blocked count: 296 >>>>> Waited count: 3206 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 25 (IPC Server handler 4 on 48722): >>>>> State: WAITING >>>>> Blocked count: 199 >>>>> Waited count: 3202 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 24 (IPC Server handler 3 on 48722): >>>>> State: WAITING >>>>> Blocked count: 216 >>>>> Waited count: 3201 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 23 (IPC Server handler 2 on 48722): >>>>> State: WAITING >>>>> Blocked count: 119 >>>>> Waited count: 3199 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 22 (IPC Server handler 1 on 48722): >>>>> State: WAITING >>>>> Blocked count: 193 >>>>> Waited count: 3207 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 21 (IPC Server handler 0 on 48722): >>>>> State: WAITING >>>>> Blocked count: 285 >>>>> Waited count: 3203 >>>>> Waiting on >>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>> Stack: >>>>> sun.misc.Unsafe.park(Native Method) >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>> >>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>> Thread 18 (IPC Server listener on 48722): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) >>>>> org.apache.hadoop.ipc.Server$Listener.run(Server.java:318) >>>>> Thread 20 (IPC Server Responder): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>> org.apache.hadoop.ipc.Server$Responder.run(Server.java:478) >>>>> Thread 17 (Timer-0): >>>>> State: TIMED_WAITING >>>>> Blocked count: 1 >>>>> Waited count: 12764 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.util.TimerThread.mainLoop(Timer.java:509) >>>>> java.util.TimerThread.run(Timer.java:462) >>>>> Thread 16 (295726...@qtp0-0 - Acceptor0 >>>> [email protected]:50060): >>>>> State: RUNNABLE >>>>> Blocked count: 1310 >>>>> Waited count: 1 >>>>> Stack: >>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>> >>>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429) >>>>> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) >>>>> >>>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) >>>>> >>>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) >>>>> >>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) >>>>> Thread 4 (Signal Dispatcher): >>>>> State: RUNNABLE >>>>> Blocked count: 0 >>>>> Waited count: 0 >>>>> Stack: >>>>> Thread 3 (Finalizer): >>>>> State: WAITING >>>>> Blocked count: 1337 >>>>> Waited count: 1336 >>>>> Waiting on java.lang.ref.referencequeue$l...@5e77533b >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) >>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) >>>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) >>>>> Thread 2 (Reference Handler): >>>>> State: WAITING >>>>> Blocked count: 1544 >>>>> Waited count: 1336 >>>>> Waiting on java.lang.ref.reference$l...@46efbdf1 >>>>> Stack: >>>>> java.lang.Object.wait(Native Method) >>>>> java.lang.Object.wait(Object.java:485) >>>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) >>>>> Thread 1 (main): >>>>> State: RUNNABLE >>>>> Blocked count: 152594 >>>>> Waited count: 279661 >>>>> Stack: >>>>> sun.management.ThreadImpl.getThreadInfo0(Native Method) >>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147) >>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123) >>>>> >>>> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) >>>>> >>>> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203) >>>>> >>>> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323) >>>>> >>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106) >>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720) >>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833) >>>>> >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: About >>>> to purge task: attempt_201004301437_0050_m_000001_0 >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: >>>> addFreeSlot : current free slots : 1 >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner: >>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map ID >>>> attempt_201004301437_0050_m_000001_0 not found in cache >>>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM : >>>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0 >>>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner: >>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: >>>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0 >>>> task's >>>> state:FAILED_UNCLEAN >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: Trying >>>> to launch : attempt_201004301437_0050_m_000001_0 >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In >>>> TaskLauncher, current free slots : 1 and trying to launch >>>> attempt_201004301437_0050_m_000001_0 >>>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker: JVM >>>> with ID: jvm_201004301437_0050_m_-1095145752 given task: >>>> attempt_201004301437_0050_m_000001_0 >>>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker: >>>> attempt_201004301437_0050_m_000001_0 0.0% >>>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker: >>>> attempt_201004301437_0050_m_000001_0 0.0% cleanup >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: Task >>>> attempt_201004301437_0050_m_000001_0 is done. >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: >>>> reported output size for attempt_201004301437_0050_m_000001_0 was 0 >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner: >>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>> ---------------------------------------------------------------- >>>>> >>>>> The syslog of the attempt just looks like: >>>>> ---------------------------------------------------------------- >>>>> 2010-05-04 23:48:03,365 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>> Initializing JVM Metrics with processName=MAP, sessionId= >>>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData: >>>> Failed to get version from the manifest file of jar >>>> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar' >>>> : null >>>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask: >>>> numReduceTasks: 0 >>>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader: >>>> Loaded the native-hadoop library >>>>> 2010-05-04 23:48:04,873 INFO >>>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >>>> initialized native-zlib library >>>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool: Got >>>> brand-new compressor >>>>> 2010-05-05 00:59:08,753 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>> Initializing JVM Metrics with processName=CLEANUP, sessionId= >>>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner: >>>> Runnning cleanup for the task >>>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner: >>>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the process of >>>> commiting >>>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner: Task >>>> 'attempt_201004301437_0050_m_000001_0' done. >>>>> ---------------------------------------------------------------- >>>>> >>>>> Any ideas ? >>>>> >>>>> best regards >>>>> Johannes >>>> >>>> >>>> >>>> -- >>>> /* >>>> Joe Stein >>>> http://www.linkedin.com/in/charmalloc >>>> */ >>>> >> > > _________________________________________________________________ > The New Busy is not the old busy. Search, chat and e-mail from your inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
