[
https://issues.apache.org/jira/browse/CONNECTORS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907411#comment-13907411
]
Karl Wright commented on CONNECTORS-898:
----------------------------------------
I have seen ConnectionLoss issues when zookeeper server is configured with too
low a number for max connections. See the zookeeper.cfg file with the example
for ideas how to address, if that's what is happening.
My understanding is that zookeeper client is supposed to reconnect
automatically when an ensemble member goes away. The fact that it is not doing
this seems like a zookeeper bug or configuration problem. I've googled for
similar error messages in conjunction with zookeeper but only found instances
where it occurs in other packages, e.g. Hadoop and Apache Curator. No
solutions given. So if you know what the actual problem/solution is, please
let me know.
> Agents fail to start if ZK ensemble member missing
> --------------------------------------------------
>
> Key: CONNECTORS-898
> URL: https://issues.apache.org/jira/browse/CONNECTORS-898
> Project: ManifoldCF
> Issue Type: Bug
> Components: Framework agents process
> Affects Versions: ManifoldCF 1.5
> Environment: 4 Agents
> 3 member ZK ensemble (2 live, 1 dead)
> Reporter: Graeme Seaton
>
> If a member of the ZK ensemble is down but there is still a majority of
> members active so that ZK is 'live' then when the agents startup any agents
> that try to connect to the missing member abort with:
> Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not
> att
> empt to authenticate using SASL (unknown error)
> 71 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn
> -
> Session 0x0 for server null, unexpected error, closing socket connection and
> att
> empting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
> )
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
> NIO.java:350)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> followed by:
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Initialization
> failed: KeeperErrorCode = ConnectionLoss for
> /org.apache.manifoldcf.configuration
> at
> org.apache.manifoldcf.core.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:269)
> at
> org.apache.manifoldcf.agents.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:43)
> at
> org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:36)
> at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
> This has a knock affect to the other agents which then eventually fail with
> 'agents process could not start - shutting down'.
> Besides exceptions of this type:
> 5401 [main-SendThread(overlorddev03:2181)] INFO
> org.apache.zookeeper.ClientCnxn
> - Opening socket connection to server overlorddev03/10.250.0.36:2181. Will
> not a
> ttempt to authenticate using SASL (unknown error)
> 5403 [main-SendThread(overlorddev03:2181)] WARN
> org.apache.zookeeper.ClientCnxn
> - Session 0x0 for server null, unexpected error, closing socket connection
> and a
> ttempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
> )
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
> NIO.java:350)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 5506 [main-SendThread(overlorddev04:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> overlorddev04/10.250.0.46:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 5507 [main-SendThread(overlorddev04:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> overlorddev04/10.250.0.46:2181, initiating session
> the only other notable exception is:
> 5509 [main-SendThread(overlorddev04:2181)] INFO
> org.apache.zookeeper.ClientCnxn
> - Session establishment complete on server overlorddev04/10.250.0.46:2181,
> sessi
> onid = 0x4444f2cb0590087, negotiated timeout = 8000
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: KeeperErrorCode =
> Con
> nectionLoss for /org.apache.manifoldcf.flags-_AGENTRUN_
> at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.checkGlobalFlag(ZooKeeperConnection.java:499)
> at
> org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.checkGlobalFlag(ZooKeeperLockManager.java:787)
> at
> org.apache.manifoldcf.agents.system.AgentsDaemon.runAgents(AgentsDaemon.java:110)
> at org.apache.manifoldcf.agents.AgentRun.doExecute(AgentRun.java:64)
> at
> org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:37)
> at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)