[
https://issues.apache.org/jira/browse/HADOOP-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amareshwari Sriramadasu updated HADOOP-5198:
--------------------------------------------
Attachment: patch-5198.txt
Attaching patch with not-null check for pid, before passing it to kill-process.
The NPE occurs when jvmIdToRunner map contains the runner and pid file is
already cleanedup.
The scenario: successful tasks (reducers or cleanup attempts) cleanup their
files as early as possible and jvmIdToRunner entry is deleted in
updateOnJvmExit. If a LaunchTaskAction comes before updateOnJvmExit call,
reapJvm for new task still finds the jvmRunner and tries to kill it, thereby
NPE for the pid.
So, one solution is reapJvm need not kill the process when pid is null, because
the task has reported done and jvm is on the way to exit already (in all the
cases)
Thoughts?
> NPE in Shell.runCommand()
> -------------------------
>
> Key: HADOOP-5198
> URL: https://issues.apache.org/jira/browse/HADOOP-5198
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred, util
> Affects Versions: 0.21.0
> Reporter: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: patch-5198.txt
>
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> I have seen one of the task failures with following exception:
> java.lang.NullPointerException
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:441)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> at org.apache.hadoop.util.Shell.run(Shell.java:134)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> at org.apache.hadoop.util.ProcessTree.isAlive(ProcessTree.java:244)
> at
> org.apache.hadoop.util.ProcessTree.sigKillInCurrentThread(ProcessTree.java:67)
> at org.apache.hadoop.util.ProcessTree.sigKill(ProcessTree.java:115)
> at
> org.apache.hadoop.util.ProcessTree.destroyProcessGroup(ProcessTree.java:164)
> at org.apache.hadoop.util.ProcessTree.destroy(ProcessTree.java:180)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:377)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:249)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
> at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:76)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:411)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.