Hello Johannes,

        Is this about the common tip of using StringBuilder/StringBuffer
instead of '+' when concatenating Strings? I would like to know more about
the problem and how you resolved it.


Regards,
Raghava.

On Fri, May 28, 2010 at 6:43 AM, Johannes Zillmann <[email protected]
> wrote:

> Thanks for the valuable input!
> It was some string concatenation i did in each map call and it got super
> long. Resolving this made the problem go away!
>
> cheers
> Johannes
>
> On May 7, 2010, at 6:35 PM, Michael Segel wrote:
>
> >
> > There's a couple of things...
> >
> > 1) Inside the map() method, if you're taking longer than the timeout, you
> will fail.
> > 2) You could also fail if you run in to GC problems that cause timeout
> failures.
> >
> > What sometimes happens in long running jobs, you tend to see your GC take
> longer as the job runs.
> >
> > Do you log a start and end time in each iteration of map()? This will
> tell you if the individual time spent in a map() method call is too long.
> >
> > Also look at GC. Trying to go through all the logs can really drive one
> to drink. :-)
> >
> > HTH
> >
> > -Mike
> >
> >
> >> Subject: Re: hanging task
> >> From: [email protected]
> >> Date: Fri, 7 May 2010 17:59:01 +0200
> >> To: [email protected]
> >>
> >> Sorry, forgot to say that on map task runs about more then one hour.
> >> So i do not think its the progress thing. So there are successful map
> tasks and those which are failing failing long after the 10 min timeout.
> >> And the question is why there is not any stack of the map task visibly
> in the thread dump.
> >> Is it swallowed or is hadoop stuck right after the mapper is done ?
> >>
> >> Johannes
> >>
> >>
> >> On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote:
> >>
> >>> Hello Johannes,
> >>>
> >>> I had a similar problem and I used the first approach suggested by
> Joseph
> >>> i.e. to report the status back. I used progress() method as well as the
> >>> setStatus() method. progress() method javadoc link is given below.
> >>>
> >>>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29
> >>>
> >>> This should solve your problem. If you come to know of the reason why
> it is
> >>> taking more time than you expected, please mention it here.
> >>>
> >>> Regards,
> >>> Raghava.
> >>>
> >>> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <[email protected]>
> wrote:
> >>>
> >>>> You need to either report status or increment a counter from within
> >>>> your task. In your Java code there is a little trick to help the job
> >>>> be “aware” within the cluster of tasks that are not dead but just
> >>>> working hard.  During execution of a task there is no built in
> >>>> reporting that the job is running as expected if it is not writing
> >>>> out.  So this means that if your tasks are taking up a lot of time
> >>>> doing work it is possible the cluster will see that task as failed
> >>>> (based on the mapred.task.tracker.expiry.interval setting).
> >>>>
> >>>> Have no fear there is a way to tell cluster that your task is doing
> >>>> just fine.  You have 2 ways todo this you can either report the status
> >>>> or increment a counter.  Both of these will cause the task tracker to
> >>>> properly know the task is ok and this will get seen by the jobtracker
> >>>> in turn.  Both of these options are explained in the JavaDoc
> >>>>
> >>>>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html
> >>>>
> >>>> Here are some pointers in general you may find useful
> >>>>
> >>>>
> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
> >>>>
> >>>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann
> >>>> <[email protected]> wrote:
> >>>>> Hi hadoop folks,
> >>>>>
> >>>>> i'm encountering following problem on a 4 node cluster running
> >>>> hadoop-0.20.2.
> >>>>>
> >>>>> Its a map only job reading about 9 GB data from outside of hadoop. 31
> map
> >>>> tasks at all while 12 map tasks running at a time.
> >>>>> The first wave of mappers finishes successfully.
> >>>>> Later on the first tasks are failing and they do that shortly before
> >>>> finishing:
> >>>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for
> 602
> >>>> seconds. Killing!
> >>>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for
> 600
> >>>> seconds. Killing!
> >>>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for
> 602
> >>>> seconds. Killing!
> >>>>>
> >>>>> The unusual is that i do not find any signs of the job code in the
> thread
> >>>> dump the tasktracker takes automatically:
> >>>>> ----------------------------------------------------------------
> >>>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> attempt_201004301437_0050_m_000001_0: Task
> >>>> attempt_201004301437_0050_m_000001_0 failed to report status for 601
> >>>> seconds. Killing!
> >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> Process Thread Dump: lost task
> >>>>> 34 active threads
> >>>>> Thread 29555 (process reaper):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
> >>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> >>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> >>>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495
> spawned.):
> >>>>> State: WAITING
> >>>>> Blocked count: 1
> >>>>> Waited count: 2
> >>>>> Waiting on java.lang.unixproc...@34b56f69
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> >>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> >>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>>>
> >>>>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> >>>>> Thread 29550 (Thread-15193):
> >>>>> State: WAITING
> >>>>> Blocked count: 13
> >>>>> Waited count: 14
> >>>>> Waiting on java.lang.obj...@72c09161
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> >>>>> Thread 29501 (process reaper):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
> >>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> >>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> >>>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.):
> >>>>> State: WAITING
> >>>>> Blocked count: 1
> >>>>> Waited count: 2
> >>>>> Waiting on java.lang.unixproc...@56175cd4
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> >>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> >>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>>>
> >>>>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> >>>>> Thread 29496 (Thread-15162):
> >>>>> State: WAITING
> >>>>> Blocked count: 13
> >>>>> Waited count: 14
> >>>>> Waiting on java.lang.obj...@3b916de2
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> >>>>> Thread 29225 (process reaper):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
> >>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> >>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> >>>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314
> spawned.):
> >>>>> State: WAITING
> >>>>> Blocked count: 1
> >>>>> Waited count: 2
> >>>>> Waiting on java.lang.unixproc...@39874d3b
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> >>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> >>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>>>
> >>>>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> >>>>> Thread 29217 (process reaper):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  java.lang.UNIXProcess.waitForProcessExit(Native Method)
> >>>>>  java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> >>>>>  java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> >>>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636
> spawned.):
> >>>>> State: WAITING
> >>>>> Blocked count: 1
> >>>>> Waited count: 2
> >>>>> Waiting on java.lang.unixproc...@2c39220f
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> >>>>>  org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> >>>>>  org.apache.hadoop.util.Shell.run(Shell.java:134)
> >>>>>
> >>>>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> >>>>> Thread 29208 (Thread-15014):
> >>>>> State: WAITING
> >>>>> Blocked count: 18
> >>>>> Waited count: 14
> >>>>> Waiting on java.lang.obj...@7ddbbae2
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> >>>>> Thread 29203 (Thread-15011):
> >>>>> State: WAITING
> >>>>> Blocked count: 13
> >>>>> Waited count: 14
> >>>>> Waiting on java.lang.obj...@2dac3f6f
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> >>>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from
> >>>> hadoop):
> >>>>> State: TIMED_WAITING
> >>>>> Blocked count: 10075
> >>>>> Waited count: 10075
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
> >>>>>  org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
> >>>>> Thread 33 (Directory/File cleanup thread):
> >>>>> State: WAITING
> >>>>> Blocked count: 2
> >>>>> Waited count: 615
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@4d6c68b3
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
> >>>>> Thread 9 (taskCleanup):
> >>>>> State: WAITING
> >>>>> Blocked count: 30
> >>>>> Waited count: 92
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@3298407f
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317)
> >>>>>  java.lang.Thread.run(Thread.java:619)
> >>>>> Thread 32 (TaskLauncher for task):
> >>>>> State: WAITING
> >>>>> Blocked count: 127
> >>>>> Waited count: 123
> >>>>> Waiting on java.util.linkedl...@c33377
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
> >>>>> Thread 31 (TaskLauncher for task):
> >>>>> State: WAITING
> >>>>> Blocked count: 3237
> >>>>> Waited count: 3147
> >>>>> Waiting on java.util.linkedl...@67001629
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
> >>>>> Thread 30 (Map-events fetcher for all reduce tasks on
> >>>> tracker_node1t:localhost/127.0.0.1:48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 712
> >>>>> Waited count: 804
> >>>>> Waiting on java.util.tree...@1fec8cf1
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606)
> >>>>> Thread 28 (IPC Server handler 7 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 10
> >>>>> Waited count: 3199
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 27 (IPC Server handler 6 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 265
> >>>>> Waited count: 3207
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 26 (IPC Server handler 5 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 296
> >>>>> Waited count: 3206
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 25 (IPC Server handler 4 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 199
> >>>>> Waited count: 3202
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 24 (IPC Server handler 3 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 216
> >>>>> Waited count: 3201
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 23 (IPC Server handler 2 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 119
> >>>>> Waited count: 3199
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 22 (IPC Server handler 1 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 193
> >>>>> Waited count: 3207
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 21 (IPC Server handler 0 on 48722):
> >>>>> State: WAITING
> >>>>> Blocked count: 285
> >>>>> Waited count: 3203
> >>>>> Waiting on
> >>>>
> java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@b4848ae
> >>>>> Stack:
> >>>>>  sun.misc.Unsafe.park(Native Method)
> >>>>>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> >>>>>
> >>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> >>>>>
> >>>>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> >>>>>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> >>>>> Thread 18 (IPC Server listener on 48722):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> >>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> >>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> >>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> >>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
> >>>>>  org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
> >>>>> Thread 20 (IPC Server Responder):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> >>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> >>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> >>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> >>>>>  org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
> >>>>> Thread 17 (Timer-0):
> >>>>> State: TIMED_WAITING
> >>>>> Blocked count: 1
> >>>>> Waited count: 12764
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.util.TimerThread.mainLoop(Timer.java:509)
> >>>>>  java.util.TimerThread.run(Timer.java:462)
> >>>>> Thread 16 (295726...@qtp0-0 - Acceptor0
> >>>> [email protected]:50060):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 1310
> >>>>> Waited count: 1
> >>>>> Stack:
> >>>>>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >>>>>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> >>>>>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> >>>>>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> >>>>>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> >>>>>
> >>>>
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
> >>>>>
>  org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
> >>>>>
> >>>>
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
> >>>>>
> >>>>
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
> >>>>>
> >>>>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> >>>>> Thread 4 (Signal Dispatcher):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 0
> >>>>> Waited count: 0
> >>>>> Stack:
> >>>>> Thread 3 (Finalizer):
> >>>>> State: WAITING
> >>>>> Blocked count: 1337
> >>>>> Waited count: 1336
> >>>>> Waiting on java.lang.ref.referencequeue$l...@5e77533b
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
> >>>>>  java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
> >>>>>  java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> >>>>> Thread 2 (Reference Handler):
> >>>>> State: WAITING
> >>>>> Blocked count: 1544
> >>>>> Waited count: 1336
> >>>>> Waiting on java.lang.ref.reference$l...@46efbdf1
> >>>>> Stack:
> >>>>>  java.lang.Object.wait(Native Method)
> >>>>>  java.lang.Object.wait(Object.java:485)
> >>>>>  java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> >>>>> Thread 1 (main):
> >>>>> State: RUNNABLE
> >>>>> Blocked count: 152594
> >>>>> Waited count: 279661
> >>>>> Stack:
> >>>>>  sun.management.ThreadImpl.getThreadInfo0(Native Method)
> >>>>>  sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
> >>>>>  sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)
> >>>>>
> >>>>
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
> >>>>>
> >>>>
> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323)
> >>>>>
> >>>>
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
> >>>>>  org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
> >>>>>  org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
> >>>>>
> >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> About
> >>>> to purge task: attempt_201004301437_0050_m_000001_0
> >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> addFreeSlot : current free slots : 1
> >>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner:
> >>>> attempt_201004301437_0050_m_000001_0 done; removing files.
> >>>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map
> ID
> >>>> attempt_201004301437_0050_m_000001_0 not found in cache
> >>>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM
> :
> >>>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0
> >>>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner:
> >>>> attempt_201004301437_0050_m_000001_0 done; removing files.
> >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0
> task's
> >>>> state:FAILED_UNCLEAN
> >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
> Trying
> >>>> to launch : attempt_201004301437_0050_m_000001_0
> >>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In
> >>>> TaskLauncher, current free slots : 1 and trying to launch
> >>>> attempt_201004301437_0050_m_000001_0
> >>>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM
> >>>> with ID: jvm_201004301437_0050_m_-1095145752 given task:
> >>>> attempt_201004301437_0050_m_000001_0
> >>>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> attempt_201004301437_0050_m_000001_0 0.0%
> >>>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> attempt_201004301437_0050_m_000001_0 0.0% cleanup
> >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
> Task
> >>>> attempt_201004301437_0050_m_000001_0 is done.
> >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
> >>>> reported output size for attempt_201004301437_0050_m_000001_0  was 0
> >>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner:
> >>>> attempt_201004301437_0050_m_000001_0 done; removing files.
> >>>>> ----------------------------------------------------------------
> >>>>>
> >>>>> The syslog of the attempt just looks like:
> >>>>> ----------------------------------------------------------------
> >>>>> 2010-05-04 23:48:03,365 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics:
> >>>> Initializing JVM Metrics with processName=MAP, sessionId=
> >>>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData:
> >>>> Failed to get version from the manifest file of jar
> >>>>
> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar'
> >>>> : null
> >>>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask:
> >>>> numReduceTasks: 0
> >>>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader:
> >>>> Loaded the native-hadoop library
> >>>>> 2010-05-04 23:48:04,873 INFO
> >>>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> >>>> initialized native-zlib library
> >>>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> >>>> brand-new compressor
> >>>>> 2010-05-05 00:59:08,753 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics:
> >>>> Initializing JVM Metrics with processName=CLEANUP, sessionId=
> >>>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner:
> >>>> Runnning cleanup for the task
> >>>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner:
> >>>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the
> process of
> >>>> commiting
> >>>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner:
> Task
> >>>> 'attempt_201004301437_0050_m_000001_0' done.
> >>>>> ----------------------------------------------------------------
> >>>>>
> >>>>> Any ideas ?
> >>>>>
> >>>>> best regards
> >>>>> Johannes
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> /*
> >>>> Joe Stein
> >>>> http://www.linkedin.com/in/charmalloc
> >>>> */
> >>>>
> >>
> >
> > _________________________________________________________________
> > The New Busy is not the old busy. Search, chat and e-mail from your
> inbox.
> >
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
>
>

Reply via email to