[
https://issues.apache.org/jira/browse/HADOOP-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643197#action_12643197
]
Steve Loughran commented on HADOOP-4532:
----------------------------------------
This is very race condition dependent; a small change in machine timing and the
interrupt is handled gracefully by reporting failure to the caller.
[sf-startdaemon-debug] 08/10/28 13:15:07 [Thread-6] WARN util.Shell :
Interrupted while reading the error stream
[sf-startdaemon-debug] java.lang.InterruptedException
[sf-startdaemon-debug] at java.lang.Object.wait(Native Method)
[sf-startdaemon-debug] at java.lang.Thread.join(Thread.java:1143)
[sf-startdaemon-debug] at java.lang.Thread.join(Thread.java:1196)
[sf-startdaemon-debug] at
org.apache.hadoop.util.Shell.runCommand(Shell.java:189)
[sf-startdaemon-debug] at org.apache.hadoop.util.Shell.run(Shell.java:134)
[sf-startdaemon-debug] at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
[sf-startdaemon-debug] at
org.apache.hadoop.util.Shell.execCommand(Shell.java:338)
[sf-startdaemon-debug] at
org.apache.hadoop.security.UnixUserGroupInformation.executeShellCommand(UnixUserGroupInformation.java:326)
[sf-startdaemon-debug] at
org.apache.hadoop.security.UnixUserGroupInformation.getUnixUserName(UnixUserGroupInformation.java:305)
[sf-startdaemon-debug] at
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:232)
[sf-startdaemon-debug] at
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
[sf-startdaemon-debug] at
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
[sf-startdaemon-debug] at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setConfigurationParameters(FSNamesystem.java:426)
[sf-startdaemon-debug] at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:304)
[sf-startdaemon-debug] at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:290)
[sf-startdaemon-debug] at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:165)
[sf-startdaemon-debug] at
org.apache.hadoop.hdfs.server.namenode.NameNode.innerStart(NameNode.java:226)
given I am interrupting threads during their initialization, I shouldn't expect
things to work. But stopping the JVM is probably inappropriate.
> Interrupting the namenode thread triggers System.exit()
> -------------------------------------------------------
>
> Key: HADOOP-4532
> URL: https://issues.apache.org/jira/browse/HADOOP-4532
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.20.0
> Reporter: Steve Loughran
> Priority: Minor
>
> My service setup/teardown tests are managing to trigger system exits in the
> namenode, which seems overkill.
> 1. Interrupting the thread that is starting the namesystem up raises a
> java.nio.channels.ClosedByInterruptException.
> 2. This is caught in FSImage.rollFSImage, and handed off to processIOError
> 3. This triggers a call to Runtime.getRuntime().exit(-1); "All storage
> directories are inaccessible.".
> Stack trace to follow. Exiting the JVM is somewhat overkill; if someone has
> interrupted the thread is is (presumably) because they want to stop the
> namenode, which may not imply they want to kill the JVM at the same time.
> Certainly JUnit does not expect it.
> Some possibilities
> -ClosedByInterruptException get handled differently as some form of shutdown
> request
> -Calls to system exit are factored out into something that can have its
> behaviour changed by policy options to throw a RuntimeException instead.
> Hosting a Namenode in a security manager that blocks off System.exit() is the
> simplest workaround; this is fairly simple, but it means that what would be a
> straight exit does now get turned into an exception, so callers may be
> surprised by what happens.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.