Re: Zookeeper error

Mark Payne Fri, 18 Nov 2016 11:49:45 -0800

Joe,

Assuming that you're using an embedded ZooKeeper server, it is not surprising 
that you saw a lot of
ERROR-level messages about dropped ZK connections. Since you have only 1 of 3 
NiFi nodes up,
you had only 1 of 3 ZK servers, so there was no quorum and you were continually 
trying to connect
to nodes that were not available. Once the other nodes were started, you should 
be okay.


The log messages that you are seeing there indicating weird ports I believe are 
the ephemeral ports
that the client is using for the outgoing connection. These should not need to 
be opened up in your VM
(assuming that you're not blocking outbound ports). The last message there 
indicates that a session was
established with a timeout of 4000 milliseconds, so I don't believe there's any 
problem with ports being
blocked.

However, once the nodes have all started up, they shouldn't have problems 
connecting to each other. Can
you grep your logs for "changed from"? NiFi logs at an INFO level every time 
the connection
status of a node in the cluster changes. This may shed some light as to why the 
nodes
were not connecting to the cluster.

Thanks
-Mark


On Nov 18, 2016, at 12:30 PM, Jeff 
<[email protected]<mailto:[email protected]>> wrote:

Joe,

I'm glad you were able to get the nodes to reconnect, but I'm interested to
know how it got into a state where it couldn't start up previously.  If you
can reproduce the scenario, and provide the full logs and your NiFi
configuration, we can investigate what caused it to get into that state.

On Fri, Nov 18, 2016 at 12:17 PM Joe Gresock 
<[email protected]<mailto:[email protected]>> wrote:

I waited the 5 minutes of the election process, and then several minutes
beyond that.

Incidentally, when I cleared the state (except zookeeper/my_id) from all
the nodes, and deleted the flow.xml.gz from all but one of the nodes, and
then restarted hte whole cluster, it came back.

On Fri, Nov 18, 2016 at 5:11 PM, Jeff 
<[email protected]<mailto:[email protected]>> wrote:

Hello Joe,

Just out of curiosity, how long did you let NiFi run while waiting for
the
nodes to connect?

On Fri, Nov 18, 2016 at 10:53 AM Joe Gresock 
<[email protected]<mailto:[email protected]>> wrote:

Despite starting up, the nodes now cannot connect to each other, so
they're
all listed as Disconnected in the UI.  I see this in the logs:

2016-11-18 15:50:19,080 INFO [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
session at /172.31.33.34:47224
2016-11-18 15:50:19,081 INFO [CommitProcessor:2]
o.a.zookeeper.server.ZooKeeperServer Established session
0x258781845940bf9
with negotiated timeout 4000 for client /172.31.33.34:47224
2016-11-18 15:50:19,185 INFO [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
session at /172.31.33.34:47228
2016-11-18 15:50:19,186 INFO [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
session at /172.31.33.34:47230
2016-11-18 15:50:19,187 INFO [CommitProcessor:2]
o.a.zookeeper.server.ZooKeeperServer Established session
0x258781845940bfa
with negotiated timeout 4000 for client /172.31.33.34:47228
2016-11-18 15:50:19,187 INFO [CommitProcessor:2]
o.a.zookeeper.server.ZooKeeperServer Established session
0x258781845940bfb
with negotiated timeout 4000 for client /172.31.33.34:47230
2016-11-18 15:50:19,292 INFO [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181]
o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
session at /172.31.33.34:47234
2016-11-18 15:50:19,293 INFO [CommitProcessor:2]
o.a.zookeeper.server.ZooKeeperServer Established session
0x258781845940bfc
with negotiated timeout 4000 for client /172.31.33.34:47234


However, I definitely did not open any ports similar to 47234 on my
nifi
VMs.  Is there a certain set of ports that need to be open between the
servers?  My understanding was that only 2888, 3888, and 2121 were
necessary for zookeeper.

On Fri, Nov 18, 2016 at 3:41 PM, Joe Gresock 
<[email protected]<mailto:[email protected]>>
wrote:

It appears that if you try to start up just one node in a cluster
with
multiple zk hosts specified in zookeeper.properties, you get this
error
spammed at an incredible rate in your logs.  When I started up all 3
nodes
at once, they didn't receive the error.

On Fri, Nov 18, 2016 at 3:18 PM, Joe Gresock 
<[email protected]<mailto:[email protected]>>
wrote:

I'm upgrading a test 0.x nifi cluster to 1.x using the latest in
master
as of today.

I was able to successfully start the 3-node cluster once, but then I
restarted it and get the following error spammed in the
nifi-app.log.

I'm not sure where to start debugging this, and I'm puzzled why it
would
work once and then start giving me errors on the second restart.
Has
anyone run into this error?

2016-11-18 15:07:18,178 INFO [main] org.eclipse.jetty.server.Server
Started @83426ms
2016-11-18 15:07:18,883 INFO [main]
org.apache.nifi.web.server.JettyServer
Loading Flow...
2016-11-18 15:07:18,889 INFO [main]
org.apache.nifi.io.socket.SocketListener
Now listening for connections from nodes on port 9001
2016-11-18 15:07:19,117 INFO [main]
o.a.nifi.controller.StandardFlowService
Connecting Node: ip-172-31-33-34.ec2.internal:8443
2016-11-18 15:07:25,781 WARN [main]
o.a.nifi.controller.StandardFlowService
There is currently no Cluster Coordinator. This often happens upon
restart
of NiFi when running an embedded ZooKeeper. Will register this node
to
become the active Cluster Coordinator and will attempt to connect to
cluster again
2016-11-18 15:07:25,782 INFO [main]
o.a.n.c.l.e.CuratorLeaderElectionManager
CuratorLeaderElectionManager[stopped=false] Attempted to register
Leader
Election for role 'Cluster Coordinator' but this role is already
registered
2016-11-18 15:07:34,685 WARN [main]
o.a.nifi.controller.StandardFlowService
There is currently no Cluster Coordinator. This often happens upon
restart
of NiFi when running an embedded ZooKeeper. Will register this node
to
become the active Cluster Coordinator and will attempt to connect to
cluster again
2016-11-18 15:07:34,685 INFO [main]
o.a.n.c.l.e.CuratorLeaderElectionManager
CuratorLeaderElectionManager[stopped=false] Attempted to register
Leader
Election for role 'Cluster Coordinator' but this role is already
registered
2016-11-18 15:07:34,696 INFO [Curator-Framework-0]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2016-11-18 15:07:34,698 INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.lea
der.election.CuratorLeaderElectionManager$ElectionListener@671a652a
Connection State changed to SUSPENDED

*2016-11-18 15:07:34,699 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave
uporg.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss*
       at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
~[zookeeper-3.4.6.jar:3.4.6-1569965]
       at org.apache.curator.framework.imps.CuratorFrameworkImpl.
check
BackgroundRetry(CuratorFrameworkImpl.java:728)
[curator-framework-2.11.0.jar:na]
       at org.apache.curator.framework.imps.CuratorFrameworkImpl.
perfo
rmBackgroundOperation(CuratorFrameworkImpl.java:857)
[curator-framework-2.11.0.jar:na]
       at org.apache.curator.framework.imps.CuratorFrameworkImpl.
backg
roundOperationsLoop(CuratorFrameworkImpl.java:809)
[curator-framework-2.11.0.jar:na]
       at org.apache.curator.framework.imps.CuratorFrameworkImpl.
acces
s$300(CuratorFrameworkImpl.java:64)
[curator-framework-2.11.0.jar:na]
       at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
cal
l(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na]
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_111]
       at java.util.concurrent.ScheduledThreadPoolExecutor$
ScheduledFu
tureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_111]
       at java.util.concurrent.ScheduledThreadPoolExecutor$
ScheduledFu
tureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_111]
       at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
[na:1.8.0_111]
       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
[na:1.8.0_111]
       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]


--
I know what it is to be in need, and I know what it is to have
plenty.
I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I
can
do all this through him who gives me strength.    *-Philippians
4:12-13*




--
I know what it is to be in need, and I know what it is to have
plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I
can
do all this through him who gives me strength.    *-Philippians
4:12-13*




--
I know what it is to be in need, and I know what it is to have
plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can
do
all this through him who gives me strength.    *-Philippians 4:12-13*





--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: Zookeeper error

Reply via email to