Ben, There are three things that I've seen cause really massive FlowFile Repositories:
1) OutOfMemoryError occurs that causes NiFi to stop working properly. 2) The "nifi.flowfile.repository.checkpoint.interval" property is set really long (2 mins is the default). 3) By far, the most common, is that the system runs out of available file handles. You can check how many file handles are available by running "ulimit -Hn" and "ulimit -Sn". We recommend at least 50,000 be set, but the default on most linux-based operating systems is much smaller, like 4,096. The Admin Guide [1] will guide you through increasing this value, if this is the problem. Thanks -Mark [1] http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html On Apr 26, 2018, at 5:26 AM, 尹文才 <[email protected]<mailto:[email protected]>> wrote: hi guys, thanks for all your answers, I actually have seen that the flowfile repo in one of our openstack centos 7 machine grew up to abour 30 GB, which as a result used up all the disk space allocated for the virtual machine and the flow inside NIFI couldn't proceed and many errors started to appear such as fail to checkpoint, etc.We used NIFI now as a ETL tool to extract some data from sql server for data analysis. I actually have no idea why the flowfile repo would grow up like this, in my idea it is only used to place all flowfile attributes. It would be great if there're some options to limit the flowfile repo size. Thanks. Regard, Ben 2018-04-26 2:08 GMT+08:00 Brandon DeVries <[email protected]<mailto:[email protected]>>: All, This is something I think we shouldn't dismiss so easily. While the FlowFile repo is lighter than the content repo, allowing it to grow too large can cause major problems. Specifically, an "overgrown" FlowFile repo may prevent a NiFi instance from coming back up after a restart due to the way in which records are held in memory. If there is more memory available to give to the JVM, this can sometimes be worked around... but if there isn't you may just be out of luck. For that matter, allowing the FlowFile repo to grow so large that it consumes all the heap isn't going to be good for system health in general (OOM is probably never where you want to be...). To Pierre's point "you don't want to limit that repository in size since it would prevent the workflows to create new flow files"... that's exactly why I would want to limit the size of the repo. You do then get into questions of how exactly to do this. For example, you may not want to simply block all transactions that create a FlowFile, because it may remove even more (e.g. MergeContent). Additionally, you have to be concerned about deadlocks (e.g. a "Wait" that hangs forever because its "Notify" is being starved). Or, perhaps that's all you can do... freeze everything at some threshold prior to actual damage being done, and alert operators that manual intervention is necessary (e.g. bring up the graph with autoResume=false, and bleed off data in a controlled fashion). In summary, I believe this is a problem. Even if it doesn't come up often, when it does it is significant. While the solution likely isn't simple, it's worth putting some thought towards. Brandon On Wed, Apr 25, 2018 at 9:43 AM Sivaprasanna <[email protected]<mailto:[email protected]>> wrote: No, he actually had mentioned “like content repository”. The answer is, there aren’t any properties that support this, AFAIK. Pierre’s response pretty much sums up why there aren’t any properties. Thanks, Sivaprasanna On Wed, 25 Apr 2018 at 7:10 PM, Mike Thomsen <[email protected]<mailto:[email protected]>> wrote: I have a feeling that what Ben meant was how to limit the content repository size. On Wed, Apr 25, 2018 at 8:26 AM Pierre Villard < [email protected]<mailto:[email protected]>> wrote: Hi Ben, Since the flow file repository contains the information of the flow files currently being processed by NiFi, you don't want to limit that repository in size since it would prevent the workflows to create new flow files. Besides this repository is very lightweight, I'm not sure it'd need to be limited in size. Do you have a specific use case in mind? Pierre 2018-04-25 9:15 GMT+02:00 尹文才 <[email protected]<mailto:[email protected]>>: Hi guys, I checked NIFI's system administrator guide trying to find a configuration item so that the size of the flowfile repository could be limited similar to the other repositories(e.g. content repository), but I didn't find such configuration items, is there currently any configuration to limit the flowfile repository's size? thanks. Regards, Ben
