[ 
https://issues.apache.org/jira/browse/NIFI-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410050#comment-15410050
 ] 

ASF GitHub Bot commented on NIFI-2406:
--------------------------------------

Github user YolandaMDavis commented on the issue:

    https://github.com/apache/nifi/pull/729
  
    @markap14 thank you for this update. I executed my tests as previously 
described and ran into no issues. When disconnecting a node designated as 
cluster coordinator from both the UI and on the command line (through 
stop/restart) the remaining nodes behaves as expected. It properly displays an 
error to the user and nots in logs it's inability to determine a cluster 
coordinator. When the coordinator is back online the logs indicate successful 
reconnection and communication. Refresh of the screen properly demonstrates the 
flow and configured flow (which consists of use of remote process groups 
communicating with a standalone nifi node) operate successfully
    
    +1
    
    thank you @markap14 for the quick turnaround on this!



> Rare start-up problems resulting in all nodes disconnected
> ----------------------------------------------------------
>
>                 Key: NIFI-2406
>                 URL: https://issues.apache.org/jira/browse/NIFI-2406
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>             Fix For: 1.0.0
>
>         Attachments: logs.tar.gz
>
>
> While testing PR 678[1], I came across a time where all the nodes were in a 
> disconnected state and each were in a weird state of heartbeating but not 
> connected.
> Also in the logs there were ~1000 lines of:
> 2016-07-26 11:38:07,841 INFO [Leader Election Notification Thread-1] 
> o.a.n.c.l.e.CuratorLeaderElectionManager 
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@24fae8c6
>  This node has been elected Leader for Role 'Cluster Coordinator'
> This message only gets called here[2] which is a call back for ZK. Also there 
> were many log messages of:
> 2016-07-26 11:54:07,910 WARN [Clustering Tasks Thread-1] 
> o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is 
> elected active Cluster Coordinator: ZooKeeper reports the address as 
> localhost:6001, but there is no node with this address
> I believe this is a problem with ZK/NiFi that existed before this PR and not 
> directly related to the PR being reviewed. I will attach a tar of the 3 
> node's logs.
> [1] https://github.com/apache/nifi/pull/678
> [2] 
> https://github.com/apache/nifi/blame/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/leader/election/CuratorLeaderElectionManager.java#L220-L220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to