Hi Raghava, it was a specific business logic problem not a common applicable problem!
Johannes On May 28, 2010, at 6:18 PM, Raghava Mutharaju wrote: > Hello Johannes, > > Is this about the common tip of using StringBuilder/StringBuffer > instead of '+' when concatenating Strings? I would like to know more about > the problem and how you resolved it. > > > Regards, > Raghava. > > On Fri, May 28, 2010 at 6:43 AM, Johannes Zillmann <[email protected] >> wrote: > >> Thanks for the valuable input! >> It was some string concatenation i did in each map call and it got super >> long. Resolving this made the problem go away! >> >> cheers >> Johannes >> >> On May 7, 2010, at 6:35 PM, Michael Segel wrote: >> >>> >>> There's a couple of things... >>> >>> 1) Inside the map() method, if you're taking longer than the timeout, you >> will fail. >>> 2) You could also fail if you run in to GC problems that cause timeout >> failures. >>> >>> What sometimes happens in long running jobs, you tend to see your GC take >> longer as the job runs. >>> >>> Do you log a start and end time in each iteration of map()? This will >> tell you if the individual time spent in a map() method call is too long. >>> >>> Also look at GC. Trying to go through all the logs can really drive one >> to drink. :-) >>> >>> HTH >>> >>> -Mike >>> >>> >>>> Subject: Re: hanging task >>>> From: [email protected] >>>> Date: Fri, 7 May 2010 17:59:01 +0200 >>>> To: [email protected] >>>> >>>> Sorry, forgot to say that on map task runs about more then one hour. >>>> So i do not think its the progress thing. So there are successful map >> tasks and those which are failing failing long after the 10 min timeout. >>>> And the question is why there is not any stack of the map task visibly >> in the thread dump. >>>> Is it swallowed or is hadoop stuck right after the mapper is done ? >>>> >>>> Johannes >>>> >>>> >>>> On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote: >>>> >>>>> Hello Johannes, >>>>> >>>>> I had a similar problem and I used the first approach suggested by >> Joseph >>>>> i.e. to report the status back. I used progress() method as well as the >>>>> setStatus() method. progress() method javadoc link is given below. >>>>> >>>>> >> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29 >>>>> >>>>> This should solve your problem. If you come to know of the reason why >> it is >>>>> taking more time than you expected, please mention it here. >>>>> >>>>> Regards, >>>>> Raghava. >>>>> >>>>> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <[email protected]> >> wrote: >>>>> >>>>>> You need to either report status or increment a counter from within >>>>>> your task. In your Java code there is a little trick to help the job >>>>>> be “aware” within the cluster of tasks that are not dead but just >>>>>> working hard. During execution of a task there is no built in >>>>>> reporting that the job is running as expected if it is not writing >>>>>> out. So this means that if your tasks are taking up a lot of time >>>>>> doing work it is possible the cluster will see that task as failed >>>>>> (based on the mapred.task.tracker.expiry.interval setting). >>>>>> >>>>>> Have no fear there is a way to tell cluster that your task is doing >>>>>> just fine. You have 2 ways todo this you can either report the status >>>>>> or increment a counter. Both of these will cause the task tracker to >>>>>> properly know the task is ok and this will get seen by the jobtracker >>>>>> in turn. Both of these options are explained in the JavaDoc >>>>>> >>>>>> >> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html >>>>>> >>>>>> Here are some pointers in general you may find useful >>>>>> >>>>>> >> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/ >>>>>> >>>>>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann >>>>>> <[email protected]> wrote: >>>>>>> Hi hadoop folks, >>>>>>> >>>>>>> i'm encountering following problem on a 4 node cluster running >>>>>> hadoop-0.20.2. >>>>>>> >>>>>>> Its a map only job reading about 9 GB data from outside of hadoop. 31 >> map >>>>>> tasks at all while 12 map tasks running at a time. >>>>>>> The first wave of mappers finishes successfully. >>>>>>> Later on the first tasks are failing and they do that shortly before >>>>>> finishing: >>>>>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for >> 602 >>>>>> seconds. Killing! >>>>>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for >> 600 >>>>>> seconds. Killing! >>>>>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for >> 602 >>>>>> seconds. Killing! >>>>>>> >>>>>>> The unusual is that i do not find any signs of the job code in the >> thread >>>>>> dump the tasktracker takes automatically: >>>>>>> ---------------------------------------------------------------- >>>>>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> attempt_201004301437_0050_m_000001_0: Task >>>>>> attempt_201004301437_0050_m_000001_0 failed to report status for 601 >>>>>> seconds. Killing! >>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> Process Thread Dump: lost task >>>>>>> 34 active threads >>>>>>> Thread 29555 (process reaper): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495 >> spawned.): >>>>>>> State: WAITING >>>>>>> Blocked count: 1 >>>>>>> Waited count: 2 >>>>>>> Waiting on java.lang.unixproc...@34b56f69 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>>>> >>>>>> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>>>> Thread 29550 (Thread-15193): >>>>>>> State: WAITING >>>>>>> Blocked count: 13 >>>>>>> Waited count: 14 >>>>>>> Waiting on java.lang.obj...@72c09161 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>>>> Thread 29501 (process reaper): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.): >>>>>>> State: WAITING >>>>>>> Blocked count: 1 >>>>>>> Waited count: 2 >>>>>>> Waiting on java.lang.unixproc...@56175cd4 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>>>> >>>>>> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>>>> Thread 29496 (Thread-15162): >>>>>>> State: WAITING >>>>>>> Blocked count: 13 >>>>>>> Waited count: 14 >>>>>>> Waiting on java.lang.obj...@3b916de2 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>>>> Thread 29225 (process reaper): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314 >> spawned.): >>>>>>> State: WAITING >>>>>>> Blocked count: 1 >>>>>>> Waited count: 2 >>>>>>> Waiting on java.lang.unixproc...@39874d3b >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>>>> >>>>>> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>>>> Thread 29217 (process reaper): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method) >>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20) >>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) >>>>>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636 >> spawned.): >>>>>>> State: WAITING >>>>>>> Blocked count: 1 >>>>>>> Waited count: 2 >>>>>>> Waiting on java.lang.unixproc...@2c39220f >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) >>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186) >>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134) >>>>>>> >>>>>> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324) >>>>>>> Thread 29208 (Thread-15014): >>>>>>> State: WAITING >>>>>>> Blocked count: 18 >>>>>>> Waited count: 14 >>>>>>> Waiting on java.lang.obj...@7ddbbae2 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>>>> Thread 29203 (Thread-15011): >>>>>>> State: WAITING >>>>>>> Blocked count: 13 >>>>>>> Waited count: 14 >>>>>>> Waiting on java.lang.obj...@2dac3f6f >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) >>>>>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from >>>>>> hadoop): >>>>>>> State: TIMED_WAITING >>>>>>> Blocked count: 10075 >>>>>>> Waited count: 10075 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403) >>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:445) >>>>>>> Thread 33 (Directory/File cleanup thread): >>>>>>> State: WAITING >>>>>>> Blocked count: 2 >>>>>>> Waited count: 615 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@4d6c68b3 >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89) >>>>>>> Thread 9 (taskCleanup): >>>>>>> State: WAITING >>>>>>> Blocked count: 30 >>>>>>> Waited count: 92 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@3298407f >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317) >>>>>>> java.lang.Thread.run(Thread.java:619) >>>>>>> Thread 32 (TaskLauncher for task): >>>>>>> State: WAITING >>>>>>> Blocked count: 127 >>>>>>> Waited count: 123 >>>>>>> Waiting on java.util.linkedl...@c33377 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601) >>>>>>> Thread 31 (TaskLauncher for task): >>>>>>> State: WAITING >>>>>>> Blocked count: 3237 >>>>>>> Waited count: 3147 >>>>>>> Waiting on java.util.linkedl...@67001629 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601) >>>>>>> Thread 30 (Map-events fetcher for all reduce tasks on >>>>>> tracker_node1t:localhost/127.0.0.1:48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 712 >>>>>>> Waited count: 804 >>>>>>> Waiting on java.util.tree...@1fec8cf1 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606) >>>>>>> Thread 28 (IPC Server handler 7 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 10 >>>>>>> Waited count: 3199 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 27 (IPC Server handler 6 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 265 >>>>>>> Waited count: 3207 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 26 (IPC Server handler 5 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 296 >>>>>>> Waited count: 3206 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 25 (IPC Server handler 4 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 199 >>>>>>> Waited count: 3202 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 24 (IPC Server handler 3 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 216 >>>>>>> Waited count: 3201 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 23 (IPC Server handler 2 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 119 >>>>>>> Waited count: 3199 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 22 (IPC Server handler 1 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 193 >>>>>>> Waited count: 3207 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 21 (IPC Server handler 0 on 48722): >>>>>>> State: WAITING >>>>>>> Blocked count: 285 >>>>>>> Waited count: 3203 >>>>>>> Waiting on >>>>>> >> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae >>>>>>> Stack: >>>>>>> sun.misc.Unsafe.park(Native Method) >>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>>>> >>>>>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>>>>> >>>>>> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) >>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) >>>>>>> Thread 18 (IPC Server listener on 48722): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) >>>>>>> org.apache.hadoop.ipc.Server$Listener.run(Server.java:318) >>>>>>> Thread 20 (IPC Server Responder): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>>>> org.apache.hadoop.ipc.Server$Responder.run(Server.java:478) >>>>>>> Thread 17 (Timer-0): >>>>>>> State: TIMED_WAITING >>>>>>> Blocked count: 1 >>>>>>> Waited count: 12764 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.util.TimerThread.mainLoop(Timer.java:509) >>>>>>> java.util.TimerThread.run(Timer.java:462) >>>>>>> Thread 16 (295726...@qtp0-0 - Acceptor0 >>>>>> [email protected]:50060): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 1310 >>>>>>> Waited count: 1 >>>>>>> Stack: >>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>>>>>> >>>>>> >> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429) >>>>>>> >> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) >>>>>>> >>>>>> >> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) >>>>>>> >>>>>> >> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) >>>>>>> >>>>>> >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) >>>>>>> Thread 4 (Signal Dispatcher): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 0 >>>>>>> Waited count: 0 >>>>>>> Stack: >>>>>>> Thread 3 (Finalizer): >>>>>>> State: WAITING >>>>>>> Blocked count: 1337 >>>>>>> Waited count: 1336 >>>>>>> Waiting on java.lang.ref.referencequeue$l...@5e77533b >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) >>>>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) >>>>>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) >>>>>>> Thread 2 (Reference Handler): >>>>>>> State: WAITING >>>>>>> Blocked count: 1544 >>>>>>> Waited count: 1336 >>>>>>> Waiting on java.lang.ref.reference$l...@46efbdf1 >>>>>>> Stack: >>>>>>> java.lang.Object.wait(Native Method) >>>>>>> java.lang.Object.wait(Object.java:485) >>>>>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) >>>>>>> Thread 1 (main): >>>>>>> State: RUNNABLE >>>>>>> Blocked count: 152594 >>>>>>> Waited count: 279661 >>>>>>> Stack: >>>>>>> sun.management.ThreadImpl.getThreadInfo0(Native Method) >>>>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147) >>>>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123) >>>>>>> >>>>>> >> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) >>>>>>> >>>>>> >> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323) >>>>>>> >>>>>> >> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106) >>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720) >>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833) >>>>>>> >>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: >> About >>>>>> to purge task: attempt_201004301437_0050_m_000001_0 >>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> addFreeSlot : current free slots : 1 >>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner: >>>>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map >> ID >>>>>> attempt_201004301437_0050_m_000001_0 not found in cache >>>>>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM >> : >>>>>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0 >>>>>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner: >>>>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0 >> task's >>>>>> state:FAILED_UNCLEAN >>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: >> Trying >>>>>> to launch : attempt_201004301437_0050_m_000001_0 >>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In >>>>>> TaskLauncher, current free slots : 1 and trying to launch >>>>>> attempt_201004301437_0050_m_000001_0 >>>>>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker: >> JVM >>>>>> with ID: jvm_201004301437_0050_m_-1095145752 given task: >>>>>> attempt_201004301437_0050_m_000001_0 >>>>>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> attempt_201004301437_0050_m_000001_0 0.0% >>>>>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> attempt_201004301437_0050_m_000001_0 0.0% cleanup >>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: >> Task >>>>>> attempt_201004301437_0050_m_000001_0 is done. >>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> reported output size for attempt_201004301437_0050_m_000001_0 was 0 >>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner: >>>>>> attempt_201004301437_0050_m_000001_0 done; removing files. >>>>>>> ---------------------------------------------------------------- >>>>>>> >>>>>>> The syslog of the attempt just looks like: >>>>>>> ---------------------------------------------------------------- >>>>>>> 2010-05-04 23:48:03,365 INFO >> org.apache.hadoop.metrics.jvm.JvmMetrics: >>>>>> Initializing JVM Metrics with processName=MAP, sessionId= >>>>>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData: >>>>>> Failed to get version from the manifest file of jar >>>>>> >> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar' >>>>>> : null >>>>>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask: >>>>>> numReduceTasks: 0 >>>>>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader: >>>>>> Loaded the native-hadoop library >>>>>>> 2010-05-04 23:48:04,873 INFO >>>>>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >>>>>> initialized native-zlib library >>>>>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool: >> Got >>>>>> brand-new compressor >>>>>>> 2010-05-05 00:59:08,753 INFO >> org.apache.hadoop.metrics.jvm.JvmMetrics: >>>>>> Initializing JVM Metrics with processName=CLEANUP, sessionId= >>>>>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner: >>>>>> Runnning cleanup for the task >>>>>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner: >>>>>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the >> process of >>>>>> commiting >>>>>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner: >> Task >>>>>> 'attempt_201004301437_0050_m_000001_0' done. >>>>>>> ---------------------------------------------------------------- >>>>>>> >>>>>>> Any ideas ? >>>>>>> >>>>>>> best regards >>>>>>> Johannes >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> /* >>>>>> Joe Stein >>>>>> http://www.linkedin.com/in/charmalloc >>>>>> */ >>>>>> >>>> >>> >>> _________________________________________________________________ >>> The New Busy is not the old busy. Search, chat and e-mail from your >> inbox. >>> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 >> >>
