Recently, I restarted 3 NiFi 1.0.0 servers as a cluster of nodes with one coordinator node with zookeeper running, and the other two without zookeeper. The nodes disconnected after working well for a few days, and I have tried restarting all of them, but, after an hour, they are still not connected.
When I log into any of the nodes on the web UI, I see this dialog immediately: <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnectedWarning.png> And the cluster server shows this: <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnected.png> or this: <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13888/NodesNotConnected_2.png> In the logs, I have the following errors or messages: Node 1: 2016-11-15 09:29:01,262 ERROR [SyncThread:0] o.apache.zookeeper.server.NIOServerCnxn Unexpected Exception: java.nio.channels.CancelledKeyException: null at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) ~[na:1.8.0_65] at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) ~[na:1.8.0_65] at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) [zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) [zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170) [zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169) [zookeeper-3.4.6.jar:3.4.6-1569965] 2016-11-15 09:29:02,622 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to renew session 0x158689232210005 at /10.178.237.180:35472 2016-11-15 09:29:02,623 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Established session 0x158689232210005 with negotiated timeout 4000 for client /10.178.237.180:35472 2016-11-15 09:29:21,053 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@25134e01 checkpointed with 1 Records and 0 Swap Files in 8 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = 2 millis), max Transaction ID 4 Nodes 2 and 3 show these: 2016-11-15 09:29:01,311 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2016-11-15 09:29:02,623 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED 2016-11-15 09:29:34,276 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLocki9e092b5 checkpointed with 1 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = ransaction ID 3 2016-11-15 09:29:38,149 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2016-11-15 09:29:39,647 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED 2016-11-15 09:29:54,671 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2016-11-15 09:29:56,304 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED 2016-11-15 09:30:03,467 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2016-11-15 09:30:04,665 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED Does anyone know what is going on here? Regards, Ben -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Nodes-will-not-reconnect-tp13888.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
