[
https://issues.apache.org/jira/browse/HDFS-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303046#comment-14303046
]
Hadoop QA commented on HDFS-7727:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12696139/check_config_SshFenceByTcpPort.1.patch
against trunk revision 8cb4731.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-common-project/hadoop-common.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/9415//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9415//console
This message is automatically generated.
> Check and verify the auto-fence settings to prevent failures of auto-failover
> -----------------------------------------------------------------------------
>
> Key: HDFS-7727
> URL: https://issues.apache.org/jira/browse/HDFS-7727
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: auto-failover
> Affects Versions: 2.4.1, 2.6.0, 2.5.1
> Reporter: Tianyin Xu
> Attachments: check_config_SshFenceByTcpPort.1.patch
>
>
> ============================
> Problem
> -------------------------------------------------
> Currently, the auto-failover feature of HDFS only checks the settings of the
> parameter "dfs.ha.fencing.methods" but not the settings of the other
> "dfs.ha.fencing.*" parameters.
> Basically, the configuration settings of other "dfs.ha.fencing" are not
> checked and verified at initialization but directly parsed and applied at
> runtime. Any configuration errors would prevent the auto-failover.
> Since the values are used to deal with failures (auto-failover) so you won't
> notice the errors until the active NameNode fails and triggers the fence
> procedure in the auto-failover process.
> ============================
> Parameters
> -------------------------------------------------
> In SSHFence, there are two configuration parameters defined in
> SshFenceByTcpPort.java
> "dfs.ha.fencing.ssh.connect-timeout";
> "dfs.ha.fencing.ssh.private-key-files"
> They are used in the tryFence() function for auto-fencing.
> Any erroneous settings of these two parameters would result in uncaught
> exceptions that would prevent the fencing and autofailover. We have verified
> this by setting a two-NameNode autofailover cluster and manually kill the
> active NameNode. The passive NameNode cannot takeover.
> For "dfs.ha.fencing.ssh.connect-timeout", the erroneous settings include
> ill-formatted integers and negative integers for
> dfs.ha.fencing.ssh.connect-timeout (it is used for Thread.join()).
> For "dfs.ha.fencing.ssh.private-key-files", the erroneous settings include
> non-existent private-key file path or wrong permissions that fail
> jsch.addIdentity() in the createSession() method.
> The following gives one example of the failure casued by misconfiguring the
> "dfs.ha.fencing.ssh.private-key-files" parameter.
> {code}
> 2015-02-02 23:38:32,960 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2015-02-02 23:38:32,960 INFO org.apache.hadoop.ha.NodeFencer: Trying method
> 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2015-02-02 23:38:32,960 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable
> to create SSH session
> com.jcraft.jsch.JSchException: java.io.FileNotFoundException:
> /home/hadoop/.ssh/id_rsax (No such file or directory)
> at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:98)
> at com.jcraft.jsch.JSch.addIdentity(JSch.java:206)
> at com.jcraft.jsch.JSch.addIdentity(JSch.java:192)
> at
> org.apache.hadoop.ha.SshFenceByTcpPort.createSession(SshFenceByTcpPort.java:122)
> at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:91)
> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
> at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
> at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
> at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
> at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by: java.io.FileNotFoundException: /home/hadoop/.ssh/id_rsax (No such
> file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:146)
> at java.io.FileInputStream.<init>(FileInputStream.java:101)
> at com.jcraft.jsch.IdentityFile.newInstance(IdentityFile.java:83)
> ... 14 more
> {code}
> ============================
> Solution (the patch)
> -------------------------------------------------
> Check the configuration settings in the checkArgs() function. Currently,
> checkArg() only checks the settings of the parameter "dfs.ha.fencing.methods"
> but not the settings of the other "dfs.ha.fencing.*" parameters.
> {code:title=SshFenceByTcpPort.java|borderStyle=solid}
> /**
> * Verify that the argument, if given, in the conf is parseable.
> */
> @Override
> public void checkArgs(String argStr) throws
> BadFencingConfigurationException {
> if (argStr != null) {
> new Args(argStr);
> }
> <= Insert the checkers here (see the patch attached)
> }
> {code}
> The detailed patch is shown below.
> {code}
> @@ -76,6 +77,23 @@
> if (argStr != null) {
> new Args(argStr);
> }
> +
> + //The configuration could be empty (e.g., called from DFSHAAdmin)
> + if(getConf().size() > 0) {
> + //check ssh.connect-timeout
> + if(getSshConnectTimeout() <= 0)
> + throw new BadFencingConfigurationException(
> + CONF_CONNECT_TIMEOUT_KEY +
> + "property value should be positive and non-zero");
> +
> + //check the settings of dfs.ha.fencing.ssh.private-key-files
> + for (String keyFilePath : getKeyFiles()) {
> + File keyFile = new File(keyFilePath);
> + if(!keyFile.isFile() || !keyFile.canRead())
> + throw new BadFencingConfigurationException(
> + "The configured private key file is invalid: " +
> keyFilePath);
> + }
> + }
> }
>
> @Override
> {code}
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)