Are you using the embedded Zookeeper? If yes we recommend using an external zookeeper.
What type of load are the systems under when this occurs (cpu, network, memory, disk io)? Under high load the default timeouts for clustering are too aggressive. You can relax these for higher load clusters and should see good behavior. Even if the system overall is not under all that high of load if you're seeing garbage collection pauses that are lengthy and/or frequent it can cause the same high load effect as far as the JVM is concerned. Thanks Joe On Wed, May 24, 2017 at 9:11 AM, Mark Bean <[email protected]> wrote: > We have a cluster which is showing signs of instability. The Primary Node > and Coordinator are reassigned to different nodes every several minutes. I > believe this is due to lack of heartbeat or other coordination. The > following error occurs periodically in the nifi-app.log > > ERROR [CommitProcessor:1] o.apache.zookeeper.server.NIOServerCnxn > Unexpected Exception: > java.nio.channels.CancelledKeyException: null > at sun.nio.ch.SelectionKeyImpl.ensureValid(SectionKeyImpl.java:73) > at sun.nio.ch.SelectionKeyImpl.interestOps(SelctionKeyImpl.java:77) > at > org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) > at > org.apache.zookeeper.server.NIOServerCnXn.sendResopnse(NIOServerCnxn.java:1081) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) > at > org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) > > Apache NiFi 1.2.0 > > Thoughts?
