[ 
https://issues.apache.org/jira/browse/HADOOP-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HADOOP-3039:
--------------------------------------------

    Attachment: patch-3039.txt

patch adding streamer.setDaemon(true) in DFSClient.

> Runtime exceptions not killing job
> ----------------------------------
>
>                 Key: HADOOP-3039
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3039
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3039.txt
>
>
> If a map or reduce task threw a runtime exception such as an NPE, the task, 
> and ultimately the job, would fail in short order. In 0.16.0, when the reduce 
> tasks started throwing NPEs, the tasks just hung. Eventually they timed out 
> and were killed. But task has to get killed immediately if it throws NPE.
> Thread dump shows:
> "DestroyJavaVM" prio=10 tid=0x0805f800 nid=0x6b5a waiting on condition 
> [0x00000000..0xbfffcc90]
>    java.lang.Thread.State: RUNNABLE
> "Thread-12" prio=10 tid=0x083f1400 nid=0x6b87 in Object.wait() 
> [0xa2f37000..0xa2f37eb0]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0xa3af62a0> (a java.util.LinkedList)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1680)
>       - locked <0xa3af62a0> (a java.util.LinkedList)
> "Comm thread for task_200803181240_0001_r_000000_0" daemon prio=10 
> tid=0x0841f000 nid=0x6b6f waiting on condition [0xa307c000..0xa307c130]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at org.apache.hadoop.mapred.Task$1.run(Task.java:283)
>       at java.lang.Thread.run(Unknown Source)
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x083fc400 nid=0x6b6d waiting on 
> condition [0xa30cd000..0xa30cd1b0]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:626)
>       at java.lang.Thread.run(Unknown Source)
> "IPC Client connection to localhost/127.0.0.1:9000" daemon prio=10 
> tid=0x083f6800 nid=0x6b6c in Object.wait() [0xa311d000..0xa311e030]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
>       at java.lang.Object.wait(Object.java:485)
>       at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:247)
>       - locked <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
>       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:286)
> It looks like Task is waiting for DataStreamer thread to get closed. 
> When I did  streamer.setDaemon(true), the behavior was fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to