The problem that you're running into is that when NiFi starts, it tries to connect to ZooKeeper, but you have a two-node cluster, both running ZooKeeper. As a result, you will need both nodes up and running in order to establish a ZooKeeper quorum. So when you startup, you'll have trouble connecting to ZooKeeper because there is no quorum.
For a production use, I would highly recommend using an external zookeeper instead of the embedded instance. For a simple cluster for testing/integration/etc the embedded zookeeper is fine, but I would recommend you run only 1 zookeeper instance or run 3 nodes. If you only need two NiFi nodes, you will want to remove the "server.2" line from the zookeeper.properties file on Node 1, and then on Node 2 set the "nifi.state.management.embedded.zookeeper.start" property to false. At that point, as long as Node 1 is started first, Node 2 should have no problem joining. This is partially why we recommend an external ZK for any sort of production use. Thanks -Mark > On Jun 29, 2017, at 3:47 AM, nifi-san <nairsande...@gmail.com> wrote: > > I was able to get over this.There was a typo and I can now start the two > clustered Nifi nodes. > > However, I keep on getting the below messages on both the nodes when I try > to start them. > > 2017-06-29 13:03:58,537 WARN [main] o.a.nifi.controller.StandardFlowService > There is currently no Cluster Coordinator. This often happens upon restart > of NiFi when running an embedded ZooKeeper. Will register this node to > become the active Cluster Coordinator and will attempt to connect to cluster > again > 2017-06-29 13:03:58,538 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager > CuratorLeaderElectionManager[stopped=false] Attempted to register Leader > Election for role 'Cluster Coordinator' but this role is already registered > 2017-06-29 13:04:20,867 WARN [main] o.a.nifi.controller.StandardFlowService > There is currently no Cluster Coordinator. This often happens upon restart > of NiFi when running an embedded ZooKeeper. Will register this node to > become the active Cluster Coordinator and will attempt to connect to cluster > again > 2017-06-29 13:04:20,867 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager > CuratorLeaderElectionManager[stopped=false] Attempted to register Leader > Election for role 'Cluster Coordinator' but this role is already registered > 2017-06-29 13:04:28,871 INFO [Curator-Framework-0] > o.a.c.f.state.ConnectionStateManager State change: SUSPENDED > 2017-06-29 13:04:28,872 INFO [Curator-ConnectionStateManager-0] > o.a.n.c.l.e.CuratorLeaderElectionManager > org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@25c5f7c5 > Connection State changed to SUSPENDED > 2017-06-29 13:04:28,878 ERROR [Curator-Framework-0] > o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > > > The above error comes in both the nodes. > > I tried to modify the java security file to /dev/urandom on both the nodes > in the cluster but it did not help. > Also modified the properties belwo in nifi.properties on both nodes:- > > nifi.cluster.flow.election.max.wait.time=5 mins > nifi.cluster.flow.election.max.candidates=2 > > Still it does not work. > > The only ports established are :- > > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# ^C > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 8080 > tcp 0 0 127.0.1.1:8080 0.0.0.0:* LISTEN > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 9999 > tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 9998 --not > running > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 2888 -- not > running > root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 3888 > tcp 0 0 127.0.1.1:3888 0.0.0.0:* LISTEN\ > > Tried to ping the hostnames from each of the two nodes and they look to be > fine. > Firewall has been disabled. > > Any pointers ? > > > > > > > > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Not-able-to-start-nifi-nodes-when-clustered-tp16289p16304.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.