Thanks for the valuable input!
It was some string concatenation i did in each map call and it got super long. 
Resolving this made the problem go away!

cheers
Johannes 

On May 7, 2010, at 6:35 PM, Michael Segel wrote:

> 
> There's a couple of things...
> 
> 1) Inside the map() method, if you're taking longer than the timeout, you 
> will fail.
> 2) You could also fail if you run in to GC problems that cause timeout 
> failures.
> 
> What sometimes happens in long running jobs, you tend to see your GC take 
> longer as the job runs.
> 
> Do you log a start and end time in each iteration of map()? This will tell 
> you if the individual time spent in a map() method call is too long. 
> 
> Also look at GC. Trying to go through all the logs can really drive one to 
> drink. :-)
> 
> HTH
> 
> -Mike
> 
> 
>> Subject: Re: hanging task
>> From: [email protected]
>> Date: Fri, 7 May 2010 17:59:01 +0200
>> To: [email protected]
>> 
>> Sorry, forgot to say that on map task runs about more then one hour.
>> So i do not think its the progress thing. So there are successful map tasks 
>> and those which are failing failing long after the 10 min timeout.
>> And the question is why there is not any stack of the map task visibly in 
>> the thread dump.
>> Is it swallowed or is hadoop stuck right after the mapper is done ?
>> 
>> Johannes
>> 
>> 
>> On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote:
>> 
>>> Hello Johannes,
>>> 
>>> I had a similar problem and I used the first approach suggested by Joseph
>>> i.e. to report the status back. I used progress() method as well as the
>>> setStatus() method. progress() method javadoc link is given below.
>>> 
>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29
>>> 
>>> This should solve your problem. If you come to know of the reason why it is
>>> taking more time than you expected, please mention it here.
>>> 
>>> Regards,
>>> Raghava.
>>> 
>>> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <[email protected]> wrote:
>>> 
>>>> You need to either report status or increment a counter from within
>>>> your task. In your Java code there is a little trick to help the job
>>>> be “aware” within the cluster of tasks that are not dead but just
>>>> working hard.  During execution of a task there is no built in
>>>> reporting that the job is running as expected if it is not writing
>>>> out.  So this means that if your tasks are taking up a lot of time
>>>> doing work it is possible the cluster will see that task as failed
>>>> (based on the mapred.task.tracker.expiry.interval setting).
>>>> 
>>>> Have no fear there is a way to tell cluster that your task is doing
>>>> just fine.  You have 2 ways todo this you can either report the status
>>>> or increment a counter.  Both of these will cause the task tracker to
>>>> properly know the task is ok and this will get seen by the jobtracker
>>>> in turn.  Both of these options are explained in the JavaDoc
>>>> 
>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html
>>>> 
>>>> Here are some pointers in general you may find useful
>>>> 
>>>> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
>>>> 
>>>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann
>>>> <[email protected]> wrote:
>>>>> Hi hadoop folks,
>>>>> 
>>>>> i'm encountering following problem on a 4 node cluster running
>>>> hadoop-0.20.2.
>>>>> 
>>>>> Its a map only job reading about 9 GB data from outside of hadoop. 31 map
>>>> tasks at all while 12 map tasks running at a time.
>>>>> The first wave of mappers finishes successfully.
>>>>> Later on the first tasks are failing and they do that shortly before
>>>> finishing:
>>>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for 602
>>>> seconds. Killing!
>>>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for 600
>>>> seconds. Killing!
>>>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for 602
>>>> seconds. Killing!
>>>>> 
>>>>> The unusual is that i do not find any signs of the job code in the thread
>>>> dump the tasktracker takes automatically:
>>>>> ----------------------------------------------------------------
>>>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201004301437_0050_m_000001_0: Task
>>>> attempt_201004301437_0050_m_000001_0 failed to report status for 601
>>>> seconds. Killing!
>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> Process Thread Dump: lost task
>>>>> 34 active threads
>>>>> Thread 29555 (process reaper):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495 spawned.):
>>>>> State: WAITING
>>>>> Blocked count: 1
>>>>> Waited count: 2
>>>>> Waiting on java.lang.unixproc...@34b56f69
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>> 
>>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>> Thread 29550 (Thread-15193):
>>>>> State: WAITING
>>>>> Blocked count: 13
>>>>> Waited count: 14
>>>>> Waiting on java.lang.obj...@72c09161
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>> Thread 29501 (process reaper):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.):
>>>>> State: WAITING
>>>>> Blocked count: 1
>>>>> Waited count: 2
>>>>> Waiting on java.lang.unixproc...@56175cd4
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>> 
>>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>> Thread 29496 (Thread-15162):
>>>>> State: WAITING
>>>>> Blocked count: 13
>>>>> Waited count: 14
>>>>> Waiting on java.lang.obj...@3b916de2
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>> Thread 29225 (process reaper):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314 spawned.):
>>>>> State: WAITING
>>>>> Blocked count: 1
>>>>> Waited count: 2
>>>>> Waiting on java.lang.unixproc...@39874d3b
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>> 
>>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>> Thread 29217 (process reaper):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636 spawned.):
>>>>> State: WAITING
>>>>> Blocked count: 1
>>>>> Waited count: 2
>>>>> Waiting on java.lang.unixproc...@2c39220f
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>> 
>>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>> 
>>>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>> Thread 29208 (Thread-15014):
>>>>> State: WAITING
>>>>> Blocked count: 18
>>>>> Waited count: 14
>>>>> Waiting on java.lang.obj...@7ddbbae2
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>> Thread 29203 (Thread-15011):
>>>>> State: WAITING
>>>>> Blocked count: 13
>>>>> Waited count: 14
>>>>> Waiting on java.lang.obj...@2dac3f6f
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from
>>>> hadoop):
>>>>> State: TIMED_WAITING
>>>>> Blocked count: 10075
>>>>> Waited count: 10075
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
>>>>>  org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
>>>>> Thread 33 (Directory/File cleanup thread):
>>>>> State: WAITING
>>>>> Blocked count: 2
>>>>> Waited count: 615
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@4d6c68b3
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>> 
>>>> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
>>>>> Thread 9 (taskCleanup):
>>>>> State: WAITING
>>>>> Blocked count: 30
>>>>> Waited count: 92
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@3298407f
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317)
>>>>>  java.lang.Thread.run(Thread.java:619)
>>>>> Thread 32 (TaskLauncher for task):
>>>>> State: WAITING
>>>>> Blocked count: 127
>>>>> Waited count: 123
>>>>> Waiting on java.util.linkedl...@c33377
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>> 
>>>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>>>> Thread 31 (TaskLauncher for task):
>>>>> State: WAITING
>>>>> Blocked count: 3237
>>>>> Waited count: 3147
>>>>> Waiting on java.util.linkedl...@67001629
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>> 
>>>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>>>> Thread 30 (Map-events fetcher for all reduce tasks on
>>>> tracker_node1t:localhost/127.0.0.1:48722):
>>>>> State: WAITING
>>>>> Blocked count: 712
>>>>> Waited count: 804
>>>>> Waiting on java.util.tree...@1fec8cf1
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>> 
>>>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606)
>>>>> Thread 28 (IPC Server handler 7 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 10
>>>>> Waited count: 3199
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 27 (IPC Server handler 6 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 265
>>>>> Waited count: 3207
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 26 (IPC Server handler 5 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 296
>>>>> Waited count: 3206
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 25 (IPC Server handler 4 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 199
>>>>> Waited count: 3202
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 24 (IPC Server handler 3 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 216
>>>>> Waited count: 3201
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 23 (IPC Server handler 2 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 119
>>>>> Waited count: 3199
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 22 (IPC Server handler 1 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 193
>>>>> Waited count: 3207
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 21 (IPC Server handler 0 on 48722):
>>>>> State: WAITING
>>>>> Blocked count: 285
>>>>> Waited count: 3203
>>>>> Waiting on
>>>> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
>>>>> Stack:
>>>>>  sun.misc.Unsafe.park(Native Method)
>>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>> 
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>> 
>>>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>> Thread 18 (IPC Server listener on 48722):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>>>>>  org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
>>>>> Thread 20 (IPC Server Responder):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>>  org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
>>>>> Thread 17 (Timer-0):
>>>>> State: TIMED_WAITING
>>>>> Blocked count: 1
>>>>> Waited count: 12764
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.util.TimerThread.mainLoop(Timer.java:509)
>>>>>  java.util.TimerThread.run(Timer.java:462)
>>>>> Thread 16 (295726...@qtp0-0 - Acceptor0
>>>> [email protected]:50060):
>>>>> State: RUNNABLE
>>>>> Blocked count: 1310
>>>>> Waited count: 1
>>>>> Stack:
>>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>> 
>>>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
>>>>>  org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
>>>>> 
>>>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>>>>> 
>>>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>>>>> 
>>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>>>>> Thread 4 (Signal Dispatcher):
>>>>> State: RUNNABLE
>>>>> Blocked count: 0
>>>>> Waited count: 0
>>>>> Stack:
>>>>> Thread 3 (Finalizer):
>>>>> State: WAITING
>>>>> Blocked count: 1337
>>>>> Waited count: 1336
>>>>> Waiting on java.lang.ref.referencequeue$l...@5e77533b
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>>>  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>>>  java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>>> Thread 2 (Reference Handler):
>>>>> State: WAITING
>>>>> Blocked count: 1544
>>>>> Waited count: 1336
>>>>> Waiting on java.lang.ref.reference$l...@46efbdf1
>>>>> Stack:
>>>>>  java.lang.Object.wait(Native Method)
>>>>>  java.lang.Object.wait(Object.java:485)
>>>>>  java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>>> Thread 1 (main):
>>>>> State: RUNNABLE
>>>>> Blocked count: 152594
>>>>> Waited count: 279661
>>>>> Stack:
>>>>>  sun.management.ThreadImpl.getThreadInfo0(Native Method)
>>>>>  sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
>>>>>  sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)
>>>>> 
>>>> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
>>>>> 
>>>> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
>>>>> 
>>>> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323)
>>>>> 
>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
>>>>>  org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
>>>>>  org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
>>>>> 
>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: About
>>>> to purge task: attempt_201004301437_0050_m_000001_0
>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> addFreeSlot : current free slots : 1
>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner:
>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map ID
>>>> attempt_201004301437_0050_m_000001_0 not found in cache
>>>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>>>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0
>>>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner:
>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0 
>>>> task's
>>>> state:FAILED_UNCLEAN
>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: Trying
>>>> to launch : attempt_201004301437_0050_m_000001_0
>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In
>>>> TaskLauncher, current free slots : 1 and trying to launch
>>>> attempt_201004301437_0050_m_000001_0
>>>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker: JVM
>>>> with ID: jvm_201004301437_0050_m_-1095145752 given task:
>>>> attempt_201004301437_0050_m_000001_0
>>>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201004301437_0050_m_000001_0 0.0%
>>>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> attempt_201004301437_0050_m_000001_0 0.0% cleanup
>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: Task
>>>> attempt_201004301437_0050_m_000001_0 is done.
>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> reported output size for attempt_201004301437_0050_m_000001_0  was 0
>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner:
>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>> ----------------------------------------------------------------
>>>>> 
>>>>> The syslog of the attempt just looks like:
>>>>> ----------------------------------------------------------------
>>>>> 2010-05-04 23:48:03,365 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData:
>>>> Failed to get version from the manifest file of jar
>>>> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar'
>>>> : null
>>>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask:
>>>> numReduceTasks: 0
>>>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader:
>>>> Loaded the native-hadoop library
>>>>> 2010-05-04 23:48:04,873 INFO
>>>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
>>>> initialized native-zlib library
>>>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool: Got
>>>> brand-new compressor
>>>>> 2010-05-05 00:59:08,753 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=CLEANUP, sessionId=
>>>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner:
>>>> Runnning cleanup for the task
>>>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner:
>>>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the process of
>>>> commiting
>>>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner: Task
>>>> 'attempt_201004301437_0050_m_000001_0' done.
>>>>> ----------------------------------------------------------------
>>>>> 
>>>>> Any ideas ?
>>>>> 
>>>>> best regards
>>>>> Johannes
>>>> 
>>>> 
>>>> 
>>>> --
>>>> /*
>>>> Joe Stein
>>>> http://www.linkedin.com/in/charmalloc
>>>> */
>>>> 
>> 
>                                         
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Reply via email to