[
https://issues.apache.org/jira/browse/NIFI-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061068#comment-17061068
]
Mark Payne commented on NIFI-7265:
----------------------------------
[~ganeshb] I think we would need more information to understand the problem
that you're running into. I attempted to replicate this but failed to do so.
When I stopped ZooKeeper, I did see the errors noted above spewing into the
logs. However, when I restarted ZooKeeper, within a second or two all nodes in
my cluster managed to reconnect and continue on correctly.
> [main-EventThread] o.a.c.f.state.ConnectionStateManager State change:
> SUSPENDED and KeepeerError
> ------------------------------------------------------------------------------------------------
>
> Key: NIFI-7265
> URL: https://issues.apache.org/jira/browse/NIFI-7265
> Project: Apache NiFi
> Issue Type: Bug
> Components: Configuration Management, Core Framework
> Affects Versions: 1.11.3
> Reporter: Ganesh Banda
> Priority: Critical
> Labels: Linux, kubernetes
>
> I am using Nifi 1.11.3 with external Zookeeper 5.3.6. Able to start the
> Nifi in a cluster mode. When I made complete ZK down, Nifi throwing bellow
> error and never join with the cluster. I need to restart the Nifi each node
> to to form cluster back again. Could you please help here ? I think for the
> production system restarting is not a good option I feel. Tried to increase
> zookeeper timeout to high values but didn't worked.
> Logs:
> 2020-03-17 14:02:19,708 INFO [main-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2020-03-17 14:02:19,709 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@a032996
> Connection State changed to SUSPENDED
> 2020-03-17 14:02:19,710 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@e3cbc0e
> Connection State changed to SUSPENDED
> .
> .
> .
> 2020-03-17 14:19:00,044 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-03-17 14:19:00,045 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-03-17 14:19:00,098 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-03-17 14:19:00,098 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)