Hello, Can you take a couple of thread dumps while this is happening and provide them so we can take a look?
You can put a file name as the argument to nifi.sh dump to have it written to a file. Thanks, Bryan On Wed, Jan 24, 2018 at 6:48 AM we are <[email protected]> wrote: > Hi, > > Recently we switched the server we run nifi on from a 24 core server to a 4 > core one, and since then approximately 4 times a day nifi stops responding > until it is restarted . Then we switched to an 8 cores server, and now it > happens approximately every 2 days. > > When this happens, the UI becomes unresponsive, as well as the rest api. > The number of nifi active threads metric returns 0 active threads, and the > cpu is at 100% idle. There is not large spike in flowfiles, memory or cpu > usage before nifi stops responding. But, when we checked the provenance > repo we saw that events were getting created. The logs only show that > events are being created, there are no errors or warnings. By looking into > the content of the events we were able to determine that events were > flowing up until a processor using the RedisConnectionPoolService. > > We tried to connect with the debugger to different processors and all of > them, except 4, responded and the debugger connected successfully. > The other 4 are using the RedisConnectionPoolService, and they didn't > respond. 2 of these processors are custom ones we wrote, the other 2 are > the built in wait-notify mechanism. When we tried to connect to the > RedisConnectionPoolService the debugger wasn't able to connect to it as > well. The redis service that the connection pool is connected to responds > to us normally. > > We tried to look at the active threads using /opt/nifi/bin/nifi.sh dump, > but we did not see anything strange. > > When we tried to dig into the problem we noticed that nifi uses an old > version of spring-data-redis. We don't know if this is the problem but we > opened an issue for this: https://issues.apache.org/jira/browse/NIFI-4811u > > The maximum timer driven thread count is the default (10). Our custom > processors are configured to a maximum of 10 concurrent tasks, and the > wait/notify processors are configured to 5. The RedisConnectionPoolService > is configured with the default values: > Max Total: 20 > Max Idle: 8 > Min Idle: 0 > Block When Exhausted: true > Max Evictable Idle Time: 60 seconds > Time Between Eviction Runs: 30 seconds > Num Tests Per Eviction Run: -1 > > We made sure to always call connection.close() in our custom made > processors. > Is it possible that somehow connections are not released or evicted, and > that is why nifi freezes like this? How can we determine that this is the > case? > > Thanks! > Daniel > -- Sent from Gmail Mobile
