[
https://issues.apache.org/jira/browse/NIFI-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060948#comment-17060948
]
Ganesh Banda commented on NIFI-6875:
------------------------------------
Hi Team,
I am using Nifi 1.11.3 with external Zookeeper 5.3.6. Able to start the Nifi
in a cluster mode. When I made complete ZK down, Nifi throwing bellow error and
never join with the cluster. I need to restart the Nifi each node to to form
cluster back again. Could you please help here ? I think for the production
system restarting is not a good option I feel. Tried to increase zookeeper
timeout to high values but didn't worked.
Logs:
2020-03-17 14:02:19,708 INFO [main-EventThread]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2020-03-17 14:02:19,709 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@a032996
Connection State changed to SUSPENDED
2020-03-17 14:02:19,710 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@e3cbc0e
Connection State changed to SUSPENDED
.
.
.
2020-03-17 14:19:00,044 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-17 14:19:00,045 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-17 14:19:00,098 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-17 14:19:00,098 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
> Nifi Zookeeper Cluster_Mode broken in 1.10.0
> --------------------------------------------
>
> Key: NIFI-6875
> URL: https://issues.apache.org/jira/browse/NIFI-6875
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework, Flow Versioning
> Affects Versions: 1.10.0
> Environment: Kubernetes, Linux
> Reporter: Glenn Wolfe
> Priority: Blocker
> Labels: bug, cluster-mode, kubernetes
>
> Expected: Exact same configuration and setup works perfectly on prior version
> (1.9.2), as soon as I upgrade version, NIfi is unable to initialize.
>
> With external zookeeper (cluster_mode) configuration, Nifi is unable to
> successfully elect leader and stuck in 'Invalid State: The Flow Controller is
> initializing the Data Flow'.
>
> Logs: (Stuck in Loop)
> ```
> 2019-11-15 17:00:05,991 INFO [main-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
> 2019-11-15 17:00:05,991 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@256c961a
> Connection State changed to RECONNECTED
> 2019-11-15 17:00:06,092 INFO [main-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2019-11-15 17:00:06,092 WARN [main] o.a.nifi.controller.StandardFlowService
> There is currently no Cluster Coordinator. This often happens upon restart of
> NiFi when running an embedded ZooKeeper. Will register this node to become
> the active Cluster Coordinator and will attempt to connect to cluster again
> 2019-11-15 17:00:06,093 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader
> Election for role 'Cluster Coordinator' but this role is already registered
> ```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)