Recently, I restarted 3 NiFi 1.0.0 servers as a cluster of nodes with one
coordinator node with zookeeper running, and the other two without
zookeeper. The nodes disconnected after working well for a few days, and I
have tried restarting all of them, but, after an hour, they are still not
connected. 

When I log into any of the nodes on the web UI, I see this dialog
immediately:
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnectedWarning.png>
 
And the cluster server shows this:
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnected.png>
 
or this: 
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnected_2.png>
 

In the logs, I have the following errors or messages:
Node 1:
2016-11-15 09:29:01,262 ERROR [SyncThread:0]
o.apache.zookeeper.server.NIOServerCnxn Unexpected Exception:
java.nio.channels.CancelledKeyException: null
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
~[na:1.8.0_65]
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
~[na:1.8.0_65]
        at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
[zookeeper-3.4.6.jar:3.4.6-1569965]
        at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
[zookeeper-3.4.6.jar:3.4.6-1569965]
        at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170)
[zookeeper-3.4.6.jar:3.4.6-1569965]
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)
[zookeeper-3.4.6.jar:3.4.6-1569965]
2016-11-15 09:29:02,622 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Client attempting to renew session
0x158689232210005 at /10.178.237.180:35472
2016-11-15 09:29:02,623 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Established session 0x158689232210005
with negotiated timeout 4000 for client /10.178.237.180:35472
2016-11-15 09:29:21,053 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@25134e01 checkpointed with 1 Records
and 0 Swap Files in 8 milliseconds (Stop-the-world time = 2 milliseconds,
Clear Edit Logs time = 2 millis), max Transaction ID 4

Nodes 2 and 3 show these: 
2016-11-15 09:29:01,311 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2016-11-15 09:29:02,623 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
2016-11-15 09:29:34,276 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLocki9e092b5
checkpointed with 1 Records and 0 Swap Files in 11 milliseconds
(Stop-the-world time = 2 milliseconds, Clear Edit Logs time = ransaction ID
3
2016-11-15 09:29:38,149 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2016-11-15 09:29:39,647 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
2016-11-15 09:29:54,671 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2016-11-15 09:29:56,304 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
2016-11-15 09:30:03,467 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2016-11-15 09:30:04,665 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: RECONNECTED

Does anyone know what is going on here?

Regards,
Ben



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Nodes-will-not-reconnect-tp13888.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to