Graeme Seaton created CONNECTORS-898:
----------------------------------------
Summary: Agents fail to start if ZK ensemble member missing
Key: CONNECTORS-898
URL: https://issues.apache.org/jira/browse/CONNECTORS-898
Project: ManifoldCF
Issue Type: Bug
Components: Framework agents process
Affects Versions: ManifoldCF 1.5
Environment: 4 Agents
3 member ZK ensemble (2 live, 1 dead)
Reporter: Graeme Seaton
If a member of the ZK ensemble is down but there is still a majority of members
active so that ZK is 'live' then when the agents startup any agents that try to
connect to the missing member abort with:
Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not att
empt to authenticate using SASL (unknown error)
71 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn -
Session 0x0 for server null, unexpected error, closing socket connection and att
empting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
NIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
followed by:
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Initialization
failed: KeeperErrorCode = ConnectionLoss for
/org.apache.manifoldcf.configuration
at
org.apache.manifoldcf.core.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:269)
at
org.apache.manifoldcf.agents.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:43)
at
org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:36)
at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
This has a knock affect to the other agents which then eventually fail with
'agents process could not start - shutting down'.
Besides exceptions of this type:
5401 [main-SendThread(overlorddev03:2181)] INFO org.apache.zookeeper.ClientCnxn
- Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not a
ttempt to authenticate using SASL (unknown error)
5403 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn
- Session 0x0 for server null, unexpected error, closing socket connection and a
ttempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
NIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
5506 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn
- Opening socket connection to server overlorddev04/10.250.0.46:2181. Will not
attempt to authenticate using SASL (unknown error)
5507 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn
- Socket connection established to overlorddev04/10.250.0.46:2181, initiating
session
the only other notable exception is:
5509 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn
- Session establishment complete on server overlorddev04/10.250.0.46:2181, sessi
onid = 0x4444f2cb0590087, negotiated timeout = 8000
org.apache.manifoldcf.core.interfaces.ManifoldCFException: KeeperErrorCode = Con
nectionLoss for /org.apache.manifoldcf.flags-_AGENTRUN_
at
org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.checkGlobalFlag(ZooKeeperConnection.java:499)
at
org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.checkGlobalFlag(ZooKeeperLockManager.java:787)
at
org.apache.manifoldcf.agents.system.AgentsDaemon.runAgents(AgentsDaemon.java:110)
at org.apache.manifoldcf.agents.AgentRun.doExecute(AgentRun.java:64)
at
org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:37)
at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)