We ran that command - it appears the site-to-sites that are causing the issue. 
We had a lot of remote process groups that weren't even being used (no data was 
being sent to that part of the dataflow), yet when running the lsof command 
they each had a large number of open files - almost 2k! - showing CLOSE_WAIT. 
Again, there were no flowfiles being sent to them, so can it be some kind of 
bug that keeping a remote process group open is somehow opening files and not 
closing them? (BTW, the reason we had to upgrade from 1.9.2 to 1.11.0 was 
because we had upgraded our Java version and that cause an 
IllegalBlockingModeException - is it possible that whatever fixed that problem 
is now causing an issue with open files?)

We now disabled all of the unused remote process groups. We still have several 
remote process groups that we are using so if this is the issue it might be 
difficult to avoid, but at least we decreased the number of remote process 
groups we have. Another approach we are trying is a merge content before we 
send to the Nifi having the most issues, to have fewer flow files sent at once 
site to site, and then splitting them after they are received.
Thank you!

    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen 
<[email protected]> wrote:  
 
 Can you share a description of your flows in terms of average flowfile size, 
queue size, data velocity, etc.?
Thanks,
Mike

On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <[email protected]> 
wrote:

 We seem to be experiencing the same problems. We recently upgraded several of 
our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many 
open files". Nothing else changed other than the upgrade, and our data volume 
is the same as before. The only solution we've been able to come up with is to 
run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!

    On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen 
<[email protected]> wrote:  

 Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <[email protected]> wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <[email protected]>
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>
  
  

Reply via email to