Here's a thread dump of the problem.  When I kicked off a job this
weekend, it actually completed.  I kicked off another one yesterday
and get the problem.

Calvin


On 6/1/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Calvin Yu wrote:
> public class Test {
>  public static void main(String[] args) {
>    System.out.println("interrupting..");
>    Thread.currentThread().interrupt();
>    try {
>      Thread.sleep(100);
>      System.out.println("done.");
>    } catch (InterruptedException e) {
>      e.printStackTrace();
>    }
>  }
> }
>
> Granted, this is an over-simplified test, and won't test for JVM bugs.

Yes, but it does show that's probably the intended behavior: an
interrupt should be sufficient, even if it doesn't arrive during the
call to sleep().  We still don't know why join() hung, if it's a JVM
bug, or if there's some bug in Hadoop.  In either case, I think we can
defensively code this without the use of join().

Doug

Full thread dump Java HotSpot(TM) Server VM (1.5.0_07-b03 mixed mode):

"IPC Client connection to 0.0.0.0/0.0.0.0:50050" daemon prio=10 tid=0x08156c20 
nid=0x2a in Object.wait() [0xe635b000..0xe635bbb8]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:213)
        - locked <0xf7e90000> (a org.apache.hadoop.ipc.Client$Connection)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:252)

"Sort progress reporter for task task_0086_m_000064_0" daemon prio=10 
tid=0x08599280 nid=0x12 waiting on condition [0xe6217000..0xe6217bb8]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapred.MapTask$2.run(MapTask.java:204)

"[EMAIL PROTECTED]" daemon prio=10 tid=0x08597dd8 nid=0x11 waiting on condition 
[0xe6259000..0xe6259c38]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:458)
        at java.lang.Thread.run(Thread.java:595)

"Pinger for task_0086_m_000064_0" daemon prio=10 tid=0x0862e0f8 nid=0xf waiting 
on condition [0xe62dd000..0xe62ddd38]
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.mapred.TaskTracker$Child$1.run(TaskTracker.java:1488)
        at java.lang.Thread.run(Thread.java:595)

"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=10 
tid=0x08490448 nid=0xd waiting on condition [0xe639d000..0xe639da38]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:401)

"Low Memory Detector" daemon prio=10 tid=0x081901a8 nid=0xb runnable 
[0x00000000..0x00000000]

"CompilerThread1" daemon prio=10 tid=0x0818e938 nid=0xa waiting on condition 
[0x00000000..0xf8178f4c]

"CompilerThread0" daemon prio=10 tid=0x0818db10 nid=0x9 waiting on condition 
[0x00000000..0xf81bafcc]

"AdapterThread" daemon prio=10 tid=0x0818ccc0 nid=0x8 waiting on condition 
[0x00000000..0x00000000]

"Signal Dispatcher" daemon prio=10 tid=0x0818bf30 nid=0x7 waiting on condition 
[0x00000000..0x00000000]

"Finalizer" daemon prio=10 tid=0x08180928 nid=0x6 in Object.wait() 
[0xfb29a000..0xfb29adb8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xeb67b000> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0xeb67b000> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0817f418 nid=0x5 in Object.wait() 
[0xfb2dc000..0xfb2dca38]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xeb67b520> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:474)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0xeb67b520> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x080752c0 nid=0x1 in Object.wait() [0x08046000..0x08046b8c]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xeb69cdf8> (a org.apache.hadoop.mapred.MapTask$2)
        at java.lang.Thread.join(Thread.java:1095)
        - locked <0xeb69cdf8> (a org.apache.hadoop.mapred.MapTask$2)
        at java.lang.Thread.join(Thread.java:1148)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:190)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

"VM Thread" prio=10 tid=0x0817d340 nid=0x4 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x080f6698 nid=0x2 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x080f70d8 nid=0x3 runnable 

"VM Periodic Task Thread" prio=10 tid=0x08191df0 nid=0xc waiting on condition 

Reply via email to