Dealing with cluster errors

Joe Gresock Fri, 10 Feb 2017 04:01:28 -0800

We have a 7-node cluster and we currently use the embedded zookeepers on 3
of the nodes.  I've noticed that when we have a high volume in our flow
(which is causing the CPU to be hit pretty hard), I have a really hard time
getting the console page to come up, as it cycles through the following
error messages when I relolad the page:



   - An unexpected error has occurred.  Please check the logs.  (there is
   never any error in the logs for this one)
   - Could not replicate request to <hostname> because the node is not
   connected   (this is never the current host I'm trying to hit, which makes
   the error text feel a bit irrelevant to the user.  i.e., "I wasn't trying
   to replicate a request to that node, I just want to load the console on
   this node")
   - An error occurred communicating with the application core.  Please
   check the logs and fix any configuration issues before restarting.  (Again,
   can't find any errors in nifi-app.log or nifi-user.log)

I can go about a half-hour reloading the page before it comes up once, and
then I can only get maybe one action in before it auto-refreshes and shows
me one of the above error messages again.

My first thought was that using some external zookeeper servers would
improve this, but that's just a hunch.  Has anyone encountered this
behavior with high data volume?
Joe

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Dealing with cluster errors

Reply via email to