[
https://issues.apache.org/jira/browse/NIFI-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411792#comment-15411792
]
ASF subversion and git services commented on NIFI-2406:
-------------------------------------------------------
Commit c1c052af71d4cfc8a40974e1482d5b0f3c78c902 in nifi's branch
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=c1c052a ]
NIFI-2406: Ensure that hearbeat monitor continues to run while instance is
running. This way if a node sends heartbeat to this node as elected coordinator
changes, we notify the node accordingly. Handle Exceptions more gracefully in
leader election code. Tweaked some handling of how nodes reconnect to the
cluster to ensure more stability with cluster
Signed-off-by: Yolanda M. Davis <[email protected]>
This closes #729
> Rare start-up problems resulting in all nodes disconnected
> ----------------------------------------------------------
>
> Key: NIFI-2406
> URL: https://issues.apache.org/jira/browse/NIFI-2406
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Joseph Percivall
> Assignee: Mark Payne
> Fix For: 1.0.0
>
> Attachments: logs.tar.gz
>
>
> While testing PR 678[1], I came across a time where all the nodes were in a
> disconnected state and each were in a weird state of heartbeating but not
> connected.
> Also in the logs there were ~1000 lines of:
> 2016-07-26 11:38:07,841 INFO [Leader Election Notification Thread-1]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@24fae8c6
> This node has been elected Leader for Role 'Cluster Coordinator'
> This message only gets called here[2] which is a call back for ZK. Also there
> were many log messages of:
> 2016-07-26 11:54:07,910 WARN [Clustering Tasks Thread-1]
> o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is
> elected active Cluster Coordinator: ZooKeeper reports the address as
> localhost:6001, but there is no node with this address
> I believe this is a problem with ZK/NiFi that existed before this PR and not
> directly related to the PR being reviewed. I will attach a tar of the 3
> node's logs.
> [1] https://github.com/apache/nifi/pull/678
> [2]
> https://github.com/apache/nifi/blame/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/leader/election/CuratorLeaderElectionManager.java#L220-L220
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)