[
https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037211#comment-14037211
]
Vinayakumar B commented on HADOOP-10722:
----------------------------------------
Ideally Fencing methods should be configured to not to allow multiple writers
to same shared storage.
QJM supports the fencing feature on its own. i.e. it wont allow multiple
writers at a time. So external fencing methods need not be configured.
You can remove the SSH fencing method from both machines configuration and
restart the cluster.
Then failover will happen successfully.
You can just set the below configuration for fence methods to skip SSH fence.
{code:xml}<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>{code}
> Standby NN continuing as standby when active NN machine got shutdown.
> ---------------------------------------------------------------------
>
> Key: HADOOP-10722
> URL: https://issues.apache.org/jira/browse/HADOOP-10722
> Project: Hadoop Common
> Issue Type: Bug
> Components: auto-failover, ha
> Affects Versions: 2.4.0
> Reporter: surendra singh lilhore
>
> I have HA cluster with 3 ZK, 3 QJM.
> My Active NN machine got shutdown, but still my standby NN is standby only.
> It should be active
> ZKFC logs
> ========
> {noformat}
> 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method
> 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to host-10-18-40-101...
> 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to host-10-18-40-101 port 22
> 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable
> to connect to host-10-18-40-101 as user myuser
> com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to
> host
> at com.jcraft.jsch.Util.createSocket(Util.java:386)
> at com.jcraft.jsch.Session.connect(Session.java:182)
> at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
> at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
> at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
> at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
> at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method
> org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)