Here's a thread dump of the problem. When I kicked off a job this weekend, it actually completed. I kicked off another one yesterday and get the problem.
Calvin On 6/1/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Calvin Yu wrote: > public class Test { > public static void main(String[] args) { > System.out.println("interrupting.."); > Thread.currentThread().interrupt(); > try { > Thread.sleep(100); > System.out.println("done."); > } catch (InterruptedException e) { > e.printStackTrace(); > } > } > } > > Granted, this is an over-simplified test, and won't test for JVM bugs. Yes, but it does show that's probably the intended behavior: an interrupt should be sufficient, even if it doesn't arrive during the call to sleep(). We still don't know why join() hung, if it's a JVM bug, or if there's some bug in Hadoop. In either case, I think we can defensively code this without the use of join(). Doug
Full thread dump Java HotSpot(TM) Server VM (1.5.0_07-b03 mixed mode): "IPC Client connection to 0.0.0.0/0.0.0.0:50050" daemon prio=10 tid=0x08156c20 nid=0x2a in Object.wait() [0xe635b000..0xe635bbb8] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:213) - locked <0xf7e90000> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:252) "Sort progress reporter for task task_0086_m_000064_0" daemon prio=10 tid=0x08599280 nid=0x12 waiting on condition [0xe6217000..0xe6217bb8] at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.MapTask$2.run(MapTask.java:204) "[EMAIL PROTECTED]" daemon prio=10 tid=0x08597dd8 nid=0x11 waiting on condition [0xe6259000..0xe6259c38] at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:458) at java.lang.Thread.run(Thread.java:595) "Pinger for task_0086_m_000064_0" daemon prio=10 tid=0x0862e0f8 nid=0xf waiting on condition [0xe62dd000..0xe62ddd38] at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskTracker$Child$1.run(TaskTracker.java:1488) at java.lang.Thread.run(Thread.java:595) "org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=10 tid=0x08490448 nid=0xd waiting on condition [0xe639d000..0xe639da38] at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:401) "Low Memory Detector" daemon prio=10 tid=0x081901a8 nid=0xb runnable [0x00000000..0x00000000] "CompilerThread1" daemon prio=10 tid=0x0818e938 nid=0xa waiting on condition [0x00000000..0xf8178f4c] "CompilerThread0" daemon prio=10 tid=0x0818db10 nid=0x9 waiting on condition [0x00000000..0xf81bafcc] "AdapterThread" daemon prio=10 tid=0x0818ccc0 nid=0x8 waiting on condition [0x00000000..0x00000000] "Signal Dispatcher" daemon prio=10 tid=0x0818bf30 nid=0x7 waiting on condition [0x00000000..0x00000000] "Finalizer" daemon prio=10 tid=0x08180928 nid=0x6 in Object.wait() [0xfb29a000..0xfb29adb8] at java.lang.Object.wait(Native Method) - waiting on <0xeb67b000> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0xeb67b000> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0817f418 nid=0x5 in Object.wait() [0xfb2dc000..0xfb2dca38] at java.lang.Object.wait(Native Method) - waiting on <0xeb67b520> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:474) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0xeb67b520> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x080752c0 nid=0x1 in Object.wait() [0x08046000..0x08046b8c] at java.lang.Object.wait(Native Method) - waiting on <0xeb69cdf8> (a org.apache.hadoop.mapred.MapTask$2) at java.lang.Thread.join(Thread.java:1095) - locked <0xeb69cdf8> (a org.apache.hadoop.mapred.MapTask$2) at java.lang.Thread.join(Thread.java:1148) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:190) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445) "VM Thread" prio=10 tid=0x0817d340 nid=0x4 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x080f6698 nid=0x2 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x080f70d8 nid=0x3 runnable "VM Periodic Task Thread" prio=10 tid=0x08191df0 nid=0xc waiting on condition