My setup was very similar, but didn't have the site to site reporting. On Thu, Feb 6, 2020 at 5:13 PM Joe Witt <[email protected]> wrote:
> yeah will investigate > > thanks > > On Thu, Feb 6, 2020 at 4:49 PM Ryan Hendrickson < > [email protected]> wrote: > > > Joe, > > We're running: > > > > - OpenJDK Java 1.8.0_242 > > - NiFi 1.11.0 > > - CentOS Linux 7.7.1908 > > > > > > We're seeing this across a dozen NiFi's with the same setup. To > > reproduce the issue, Generate Flow Files 100GB across a couple million > > files -> Site to Site -> Receive data -> Merge Content. We had no issues > > with this stack: > > > > - OpenJDK Java 1.8.0_232. > > - NiFi 1.9.2 > > - CentOS Linux 7.7.1908 > > > > Can your team setup a similar stack and test? > > > > Ryan > > > > On Thu, Feb 6, 2020 at 4:15 PM Joe Witt <[email protected]> wrote: > > > > > received a direct reply - Elli cannot share. > > > > > > I think unless someone else is able to replicate the behavior there > isn't > > > much more we can tackle on this. > > > > > > Thanks > > > > > > On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <[email protected]> wrote: > > > > > > > Yes Elli it is possible. Can we please get those lsof outputs in a > > JIRA? > > > > As well as more details about configuration? > > > > > > > > Thanks > > > > > > > > On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <[email protected]> > > > wrote: > > > > > > > >> I have no input on the specific issue you’re encountering, but a > > pattern > > > >> we have seen to reduce the overhead of multiple remote input ports > > being > > > >> required is to use a “central” remote input port and immediately > > follow > > > it > > > >> with a RouteOnAttribute to distribute specific flowfiles to the > > > appropriate > > > >> downstream flow / process group. Whatever sends data to this port > can > > > use > > > >> an UpdateAttribute to add some “tracking/routing” attribute on the > > > >> flowfiles before being sent. Inserting Merge/Split will likely > affect > > > your > > > >> timing due to waiting for bins to fill, depending on your volume. > S2S > > is > > > >> pretty good at transmitting data on-demand with low overhead on one > > > port; > > > >> it’s when you have many remote input ports that there is substantial > > > >> overhead. > > > >> > > > >> > > > >> Andy LoPresto > > > >> [email protected] > > > >> [email protected] > > > >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > >> > > > >> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz < > [email protected] > > > .INVALID> > > > >> wrote: > > > >> > > > > >> > We ran that command - it appears the site-to-sites that are > causing > > > the > > > >> issue. We had a lot of remote process groups that weren't even being > > > used > > > >> (no data was being sent to that part of the dataflow), yet when > > running > > > the > > > >> lsof command they each had a large number of open files - almost > 2k! - > > > >> showing CLOSE_WAIT. Again, there were no flowfiles being sent to > them, > > > so > > > >> can it be some kind of bug that keeping a remote process group open > is > > > >> somehow opening files and not closing them? (BTW, the reason we had > to > > > >> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java > > > version > > > >> and that cause an IllegalBlockingModeException - is it possible that > > > >> whatever fixed that problem is now causing an issue with open > files?) > > > >> > > > > >> > We now disabled all of the unused remote process groups. We still > > have > > > >> several remote process groups that we are using so if this is the > > issue > > > it > > > >> might be difficult to avoid, but at least we decreased the number of > > > remote > > > >> process groups we have. Another approach we are trying is a merge > > > content > > > >> before we send to the Nifi having the most issues, to have fewer > flow > > > files > > > >> sent at once site to site, and then splitting them after they are > > > received. > > > >> > Thank you! > > > >> > > > > >> > On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen < > > > >> [email protected]> wrote: > > > >> > > > > >> > Can you share a description of your flows in terms of average > > flowfile > > > >> size, queue size, data velocity, etc.? > > > >> > Thanks, > > > >> > Mike > > > >> > > > > >> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz < > > > [email protected]> > > > >> wrote: > > > >> > > > > >> > We seem to be experiencing the same problems. We recently > upgraded > > > >> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are > > > failing > > > >> with "too many open files". Nothing else changed other than the > > upgrade, > > > >> and our data volume is the same as before. The only solution we've > > been > > > >> able to come up with is to run a script to check for this condition > > and > > > >> restart the Nifi. Any other ideas? > > > >> > Thank you! > > > >> > > > > >> > On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen < > > > >> [email protected]> wrote: > > > >> > > > > >> > Without further details, this is what I did to see if it was > > > something > > > >> > other than the usual issue of having not enough file handlers > > > available. > > > >> > Something like a legitimate case of someone forgetting to close > file > > > >> > objects or something in the code itself. > > > >> > > > > >> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI. > > > >> > 2. Pushed 1.11.1RC1 > > > >> > 3. Pushed the RAM settings to 6/12GB > > > >> > 4. Disabled flowfile archiving because I only allocated 8GB of > > > storage. > > > >> > 5. Setup a flow that used 2 generateflow instances to generate > > massive > > > >> > amounts of garbage data using all available cores. (All queues > were > > > >> setup > > > >> > to hold 250k flow files) > > > >> > 6. Kicked it off and let it run for probably about 20 minutes. > > > >> > > > > >> > No apparent problem with closing and releasing resources here. > > > >> > > > > >> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <[email protected]> > wrote: > > > >> > > > > >> >> these are usually very easy to find. > > > >> >> > > > >> >> run lsof -p pid. and share results > > > >> >> > > > >> >> > > > >> >> thanks > > > >> >> > > > >> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen < > > [email protected]> > > > >> >> wrote: > > > >> >> > > > >> >>> > > > >> >>> > > > >> >> > > > >> > > > > > > https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064 > > > >> >>> > > > >> >>> No idea if this is valid or not. I asked for clarification to > see > > if > > > >> >> there > > > >> >>> might be a specific processor or something that is triggering > > this. > > > >> >>> > > > >> >> > > > >> > > > > >> > > > >> > > > > > >
