[
https://issues.apache.org/jira/browse/MAPREDUCE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhaoyunjiong updated MAPREDUCE-5260:
------------------------------------
Attachment: MAPREDUCE-5260.patch
The root cause of JvmManager running into inconsistent state is TaskTracker
lack of user information:
2013-05-14 07:01:31,482 INFO org.apache.hadoop.mapred.TaskTracker: About to
purge task: attempt_201305100625_20199_m_000431_0
2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: Reading
task controller config from /etc/hadoop/taskcontroller.cfg
2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: User
zhaoyunjiong not found
2013-05-14 07:01:31,485 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Problem signalling task 30048 with TERM; exit =
255
at
org.apache.hadoop.mapred.LinuxTaskController.signalTask(LinuxTaskController.java:319)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:555)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289)
at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158)
at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:801)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3279)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3251)
at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2286)
at
org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:2185)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1862)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2646)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3900)
This patch catch IOException throwed by LinuxTaskController to prevent
inconsistent state.
Also it make sure TT will shutdown itself when running into inconsistent state.
> Job failed because of JvmManager running into inconsistent state
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-5260
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5260
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 1.1.2
> Reporter: zhaoyunjiong
> Fix For: 1.1.3
>
> Attachments: MAPREDUCE-5260.patch
>
>
> In our cluster, jobs failed due to randomly task initialization failed
> because of JvmManager running into inconsistent state and TaskTracker failed
> to exit:
> java.lang.Throwable: Child Error
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
> at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
> at
> org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)
> -------
> java.lang.Throwable: Child Error
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
> at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
> at
> org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira