Hua xu created HADOOP-8300: ------------------------------ Summary: the TaskMemoryManager thread is not interrupt when the TaskTracker is oedered to reinit by JobTracker Key: HADOOP-8300 URL: https://issues.apache.org/jira/browse/HADOOP-8300 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.2 Reporter: Hua xu
When the TaskTracker is oedered to reinit by JobTracker, it will stop interrupt some threads and then reinit them, but TaskTracker does not interrupt TaskMemoryManager thread and create a new TaskMemoryManager thread again. I use the tool--jstack to find that: Full thread dump Java HotSpot(TM) Server VM (1.6.0-b105 mixed mode): "IPC Client (47) connection to /*:8012 from xhh" daemon prio=10 tid=0x083d2c00 nid=0x25b4 in Object.wait() [0x70852000..0x70852ea0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xadbdb7c0> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:414) - locked <0xadbdb7c0> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:456) "TaskLauncher for task" daemon prio=10 tid=0x082f2c00 nid=0x25b3 in Object.wait() [0x6fc63000..0x6fc63f20] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xadbcff48> (a java.util.LinkedList) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1608) - locked <0xadbcff48> (a java.util.LinkedList) "TaskLauncher for task" daemon prio=10 tid=0x082f2000 nid=0x25b2 in Object.wait() [0x708f4000..0x708f4fa0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xadbcfd00> (a java.util.LinkedList) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1608) - locked <0xadbcfd00> (a java.util.LinkedList) "org.apache.hadoop.mapred.TaskMemoryManagerThread" daemon prio=10 tid=0x082f1400 nid=0x25b1 waiting on condition [0x6fb47000..0x6fb48020] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:241) "Map-events fetcher on tracker_*:*/*:50070" daemon prio=10 tid=0x082f1000 nid=0x25b0 in Object.wait() [0x708a3000..0x708a40a0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xaf1227b0> (a java.util.TreeMap) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:592) - locked <0xaf1227b0> (a java.util.TreeMap) "IPC Client (47) connection to /* from xhh" daemon prio=10 tid=0x082f9000 nid=0x25af in Object.wait() [0x6fc12000..0x6fc13120] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xadb36b80> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:414) - locked <0xadb36b80> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:456) "IPC Server handler[3] on 50070" daemon prio=10 tid=0x083a3400 nid=0x25ae waiting on condition [0x6fd0b000..0x6fd0bda0] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf121468> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:963) "IPC Server handler[2] on 50070" daemon prio=10 tid=0x0847b400 nid=0x25ad waiting on condition [0x6fcba000..0x6fcbae20] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf121468> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:963) "IPC Server handler[1] on 50070" daemon prio=10 tid=0x08392c00 nid=0x25ac waiting on condition [0x6fdfe000..0x6fdfeea0] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf121468> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:963) "IPC Server handler[0] on 50070" daemon prio=10 tid=0x0846f400 nid=0x25ab waiting on condition [0x6fdad000..0x6fdadf20] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf121468> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:963) "IPC Server listener on 50070" daemon prio=10 tid=0x08345400 nid=0x25aa runnable [0x7017f000..0x7017ffa0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:184) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xaf121950> (a sun.nio.ch.Util$1) - locked <0xaf121940> (a java.util.Collections$UnmodifiableSet) - locked <0xaf121740> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:314) "IPC Server Responder" daemon prio=10 tid=0x0836b000 nid=0x25a9 runnable [0x6fd5c000..0x6fd5d020] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:184) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xaf121f80> (a sun.nio.ch.Util$1) - locked <0xaf121f70> (a java.util.Collections$UnmodifiableSet) - locked <0xaf121d88> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.ipc.Server$Responder.run(Server.java:483) "org.apache.hadoop.mapred.TaskMemoryManagerThread" daemon prio=10 tid=0x08458000 nid=0x259d waiting on condition [0x6fa03000..0x6fa03f20] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:241) "Attach Listener" daemon prio=10 tid=0x0845a800 nid=0x2593 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "org.apache.hadoop.mapred.TaskMemoryManagerThread" daemon prio=10 tid=0x0836f400 nid=0x257c waiting on condition [0x6faf6000..0x6faf6ea0] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:241) "Directory/File cleanup thread" daemon prio=10 tid=0x0845d000 nid=0x256b waiting on condition [0x6fa54000..0x6fa550a0] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf10a1d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:88) "taskCleanup" daemon prio=10 tid=0x0845bc00 nid=0x256a waiting on condition [0x6faa5000..0x6faa6120] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xaf10a088> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1889) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:310) at java.lang.Thread.run(Thread.java:619) "org.apache.hadoop.mapred.TaskMemoryManagerThread" daemon prio=10 tid=0x083c5000 nid=0x2567 waiting on condition [0x6fb98000..0x6fb98ea0] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:241) "Timer-0" daemon prio=10 tid=0x702c7800 nid=0x255a in Object.wait() [0x70390000..0x70390f20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xaf10a3a8> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0xaf10a3a8> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "14871751@qtp0-0 - Acceptor0 SelectChannelConnector@*3:50060" prio=10 tid=0x082be400 nid=0x2559 runnable [0x701fe000..0x701fefa0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:184) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0xaf119da0> (a sun.nio.ch.Util$1) - locked <0xaf119db0> (a java.util.Collections$UnmodifiableSet) - locked <0xaf119d60> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) "Low Memory Detector" daemon prio=10 tid=0x08187400 nid=0x254a runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08185c00 nid=0x2549 waiting on condition [0x00000000..0x70a474b8] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x08184800 nid=0x2548 waiting on condition [0x00000000..0x70ac8538] java.lang.Thread.State: RUNNABLE "JDWP Command Reader" daemon prio=10 tid=0x08177400 nid=0x2545 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "JDWP Event Helper Thread" daemon prio=10 tid=0x08175c00 nid=0x2542 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "JDWP Transport Listener: dt_socket" daemon prio=10 tid=0x08173800 nid=0x2541 runnable [0x00000000..0x70bbbe70] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0816bc00 nid=0x253f runnable [0x00000000..0x70c10a80] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08152800 nid=0x253e in Object.wait() [0x70c6f000..0x70c6fe20] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xaf11a950> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xaf11a950> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08151c00 nid=0x253d in Object.wait() [0x70cc0000..0x70cc0ea0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xaf11a970> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xaf11a970> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0806b400 nid=0x2536 at breakpoint[0xb7fdf000..0xb7fe01f8] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1096) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1727) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3093) "VM Thread" prio=10 tid=0x0814f400 nid=0x253c runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x08072000 nid=0x2538 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x08073000 nid=0x2539 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x08074000 nid=0x253a runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x08075400 nid=0x253b runnable "VM Periodic Task Thread" prio=10 tid=0x08188c00 nid=0x254b waiting on condition JNI global references: 3317 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira