We ran that command - it appears the site-to-sites that are causing the issue.
We had a lot of remote process groups that weren't even being used (no data was
being sent to that part of the dataflow), yet when running the lsof command
they each had a large number of open files - almost 2k! - showing CLOSE_WAIT.
Again, there were no flowfiles being sent to them, so can it be some kind of
bug that keeping a remote process group open is somehow opening files and not
closing them? (BTW, the reason we had to upgrade from 1.9.2 to 1.11.0 was
because we had upgraded our Java version and that cause an
IllegalBlockingModeException - is it possible that whatever fixed that problem
is now causing an issue with open files?)
We now disabled all of the unused remote process groups. We still have several
remote process groups that we are using so if this is the issue it might be
difficult to avoid, but at least we decreased the number of remote process
groups we have. Another approach we are trying is a merge content before we
send to the Nifi having the most issues, to have fewer flow files sent at once
site to site, and then splitting them after they are received.
Thank you!
On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen
<[email protected]> wrote:
Can you share a description of your flows in terms of average flowfile size,
queue size, data velocity, etc.?
Thanks,
Mike
On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <[email protected]>
wrote:
We seem to be experiencing the same problems. We recently upgraded several of
our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many
open files". Nothing else changed other than the upgrade, and our data volume
is the same as before. The only solution we've been able to come up with is to
run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!
On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen
<[email protected]> wrote:
Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.
1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.
No apparent problem with closing and releasing resources here.
On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <[email protected]> wrote:
> these are usually very easy to find.
>
> run lsof -p pid. and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <[email protected]>
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>