[
https://issues.apache.org/jira/browse/HADOOP-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated HADOOP-3039:
--------------------------------------------
Attachment: patch-3039.txt
patch adding streamer.setDaemon(true) in DFSClient.
> Runtime exceptions not killing job
> ----------------------------------
>
> Key: HADOOP-3039
> URL: https://issues.apache.org/jira/browse/HADOOP-3039
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Priority: Blocker
> Fix For: 0.16.2, 0.17.0
>
> Attachments: patch-3039.txt
>
>
> If a map or reduce task threw a runtime exception such as an NPE, the task,
> and ultimately the job, would fail in short order. In 0.16.0, when the reduce
> tasks started throwing NPEs, the tasks just hung. Eventually they timed out
> and were killed. But task has to get killed immediately if it throws NPE.
> Thread dump shows:
> "DestroyJavaVM" prio=10 tid=0x0805f800 nid=0x6b5a waiting on condition
> [0x00000000..0xbfffcc90]
> java.lang.Thread.State: RUNNABLE
> "Thread-12" prio=10 tid=0x083f1400 nid=0x6b87 in Object.wait()
> [0xa2f37000..0xa2f37eb0]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa3af62a0> (a java.util.LinkedList)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1680)
> - locked <0xa3af62a0> (a java.util.LinkedList)
> "Comm thread for task_200803181240_0001_r_000000_0" daemon prio=10
> tid=0x0841f000 nid=0x6b6f waiting on condition [0xa307c000..0xa307c130]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapred.Task$1.run(Task.java:283)
> at java.lang.Thread.run(Unknown Source)
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x083fc400 nid=0x6b6d waiting on
> condition [0xa30cd000..0xa30cd1b0]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:626)
> at java.lang.Thread.run(Unknown Source)
> "IPC Client connection to localhost/127.0.0.1:9000" daemon prio=10
> tid=0x083f6800 nid=0x6b6c in Object.wait() [0xa311d000..0xa311e030]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:247)
> - locked <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:286)
> It looks like Task is waiting for DataStreamer thread to get closed.
> When I did streamer.setDaemon(true), the behavior was fine.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.