Arun Suresh created HADOOP-10412:
------------------------------------
Summary: First call from Client fails after Server restart
Key: HADOOP-10412
URL: https://issues.apache.org/jira/browse/HADOOP-10412
Project: Hadoop Common
Issue Type: Bug
Components: ipc
Affects Versions: 2.2.0
Environment: Linux : centos62-2 2.6.32-220.el6.x86_64,
jdk : 1.7.0_15
Reporter: Arun Suresh
This seems to happen only for ProtobufRpc based services. Could not reproduce
using simple WritableRpc.
Steps to reproduce :
Consider the case of namenode HA failover. nn1 and nn2 are both namenodes, nn1
is 'active' and nn2 is 'standby'
1) Bring down nn1 process. Now nn2 is active
2) Bring nn1 process back up. Now nn1 is standby and nn2 is active.
3) Manually issue failover using command :
{quote}
$ hdfs haadmin -failover nn2 nn1
{quote}
It is observed that the first call always fails with the Following exception :
{quote}
Operation failed: Failed to become active. Couldn't make NameNode at
centos62-2/192.168.2.202:8020 active
java.io.IOException: Failed on local exception: java.io.EOFException; Host
Details : local host is: "centos62-2/192.168.2.202"; destination host is:
"centos62-2":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source)
at
org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
at
org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
at
org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:373)
at
org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:59)
at
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:818)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:803)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
at
org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:673)
at
org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:59)
at
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:592)
at
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:589)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at
org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:589)
at
org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94)
at
org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
at
org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
{quote}
The calls succeeds if I issue the same command subsequently
--
This message was sent by Atlassian JIRA
(v6.2#6252)