Joseph Percivall created NIFI-2406:
--------------------------------------

             Summary: Rare start-up problems resulting in all nodes disconnected
                 Key: NIFI-2406
                 URL: https://issues.apache.org/jira/browse/NIFI-2406
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Joseph Percivall


While testing PR 678[1], I came across a time where all the nodes were in a 
disconnected state and each were in a weird state of heartbeating but not 
connected.

Also in the logs there were ~1000 lines of:

2016-07-26 11:38:07,841 INFO [Leader Election Notification Thread-1] 
o.a.n.c.l.e.CuratorLeaderElectionManager 
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@24fae8c6
 This node has been elected Leader for Role 'Cluster Coordinator'

This message only gets called here[2] which is a call back for ZK. Also there 
were many log messages of:

2016-07-26 11:54:07,910 WARN [Clustering Tasks Thread-1] 
o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is elected 
active Cluster Coordinator: ZooKeeper reports the address as localhost:6001, 
but there is no node with this address

I believe this is a problem with ZK/NiFi that existed before this PR and not 
directly related to the PR being reviewed. I will attach a tar of the 3 node's 
logs.

[1] https://github.com/apache/nifi/pull/678
[2] 
https://github.com/apache/nifi/blame/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/leader/election/CuratorLeaderElectionManager.java#L220-L220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to