[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated MAPREDUCE-5260:
------------------------------------

    Attachment: MAPREDUCE-5260.patch

The root cause of JvmManager running into inconsistent state is TaskTracker 
lack of user information:
2013-05-14 07:01:31,482 INFO org.apache.hadoop.mapred.TaskTracker: About to 
purge task: attempt_201305100625_20199_m_000431_0
2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: Reading 
task controller config from /etc/hadoop/taskcontroller.cfg
2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: User 
zhaoyunjiong not found
2013-05-14 07:01:31,485 ERROR org.apache.hadoop.mapred.TaskTracker: Caught 
exception: java.io.IOException: Problem signalling task 30048 with TERM; exit = 
255
        at 
org.apache.hadoop.mapred.LinuxTaskController.signalTask(LinuxTaskController.java:319)
        at 
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:555)
        at 
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317)
        at 
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297)
        at 
org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289)
        at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158)
        at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:801)
        at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3279)
        at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3251)
        at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2286)
        at 
org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:2185)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1862)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2646)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3900)


This patch catch IOException throwed by LinuxTaskController to prevent 
inconsistent state. 
Also it make sure TT will shutdown itself when running into inconsistent state.
                
> Job failed because of JvmManager running into inconsistent state
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5260
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5260
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: zhaoyunjiong
>             Fix For: 1.1.3
>
>         Attachments: MAPREDUCE-5260.patch
>
>
> In our cluster, jobs failed due to randomly task initialization failed 
> because of JvmManager running into inconsistent state and TaskTracker failed 
> to exit:
> java.lang.Throwable: Child Error
>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
>       at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
>       at 
> org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)
> -------
> java.lang.Throwable: Child Error
>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
>       at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
>       at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
>       at 
> org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to