Re: Not able to start nifi nodes when clustered

Mark Payne Thu, 29 Jun 2017 05:12:07 -0700

The problem that you're running into is that when NiFi starts, it tries to 
connect to ZooKeeper, but
you have a two-node cluster, both running ZooKeeper. As a result, you will need 
both nodes up and
running in order to establish a ZooKeeper quorum. So when you startup, you'll 
have trouble connecting
to ZooKeeper because there is no quorum.


For a production use, I would highly recommend using an external zookeeper 
instead of the embedded
instance. For a simple cluster for testing/integration/etc the embedded 
zookeeper is fine, but I would recommend
you run only 1 zookeeper instance or run 3 nodes. If you only need two NiFi 
nodes, you will want to remove the
"server.2" line from the zookeeper.properties file on Node 1, and then on Node 
2 set the 
"nifi.state.management.embedded.zookeeper.start" property to false. At that 
point, as long as Node 1 is started
first, Node 2 should have no problem joining. This is partially why we 
recommend an external ZK for any sort
of production use.

Thanks
-Mark


> On Jun 29, 2017, at 3:47 AM, nifi-san <nairsande...@gmail.com> wrote:
> 
> I was able to get over this.There was a typo and I can now start the two
> clustered Nifi nodes.
> 
> However, I keep on getting the below messages on both the nodes when I try
> to start them.
> 
> 2017-06-29 13:03:58,537 WARN [main] o.a.nifi.controller.StandardFlowService
> There is currently no Cluster Coordinator. This often happens upon restart
> of NiFi when running an embedded ZooKeeper. Will register this node to
> become the active Cluster Coordinator and will attempt to connect to cluster
> again
> 2017-06-29 13:03:58,538 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader
> Election for role 'Cluster Coordinator' but this role is already registered
> 2017-06-29 13:04:20,867 WARN [main] o.a.nifi.controller.StandardFlowService
> There is currently no Cluster Coordinator. This often happens upon restart
> of NiFi when running an embedded ZooKeeper. Will register this node to
> become the active Cluster Coordinator and will attempt to connect to cluster
> again
> 2017-06-29 13:04:20,867 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader
> Election for role 'Cluster Coordinator' but this role is already registered
> 2017-06-29 13:04:28,871 INFO [Curator-Framework-0]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2017-06-29 13:04:28,872 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@25c5f7c5
> Connection State changed to SUSPENDED
> 2017-06-29 13:04:28,878 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>        at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>        at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>        at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>        at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>        at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:748)
> 
> 
> The above error comes in both the nodes.
> 
> I tried to modify the java security file to /dev/urandom on both the nodes
> in the cluster but it did not help.
> Also modified the properties belwo in nifi.properties on both nodes:-
> 
> nifi.cluster.flow.election.max.wait.time=5 mins
> nifi.cluster.flow.election.max.candidates=2
> 
> Still it does not work.
> 
> The only ports established are :-
> 
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# ^C
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 8080
> tcp        0      0 127.0.1.1:8080          0.0.0.0:*               LISTEN
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 9999
> tcp        0      0 0.0.0.0:9999            0.0.0.0:*               LISTEN
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 9998 --not
> running
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 2888  -- not
> running
> root@hostname-1:/opt/nifi/nifi-1.3.0/conf# netstat -an |grep 3888 
> tcp        0      0 127.0.1.1:3888          0.0.0.0:*               LISTEN\
> 
> Tried to ping the hostnames from each of the two nodes and they look to be
> fine.
> Firewall has been disabled.
> 
> Any pointers ?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Not-able-to-start-nifi-nodes-when-clustered-tp16289p16304.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Not able to start nifi nodes when clustered

Reply via email to