[ https://issues.apache.org/jira/browse/NIFI-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pierre Villard resolved NIFI-7874. ---------------------------------- Fix Version/s: (was: 1.11.4) 1.13.0 Assignee: Mark Payne Resolution: Fixed > S3List processor in v1.12.1 uses lots of CPU power and RAM > ---------------------------------------------------------- > > Key: NIFI-7874 > URL: https://issues.apache.org/jira/browse/NIFI-7874 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.12.1 > Environment: Centos 7, Amazon Cloud, 8 CPU cores, 64 GB RAM > Reporter: Dominik Dresel > Assignee: Mark Payne > Priority: Major > Fix For: 1.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We are using the S3List processor to collect our log data from S3 and process > them further. In Nifi version 1.11.4 the plugin reads a log file from S3, > creates a flow file out of it, routes it to success and repeats its loop from > the beginning. This is fast and does not need a lot of resources. We can > operate Nifi at the default 512 MB RAM with 8 CPU cores which are utilized > roughly at 50%. > With the new version of the S3List processor (v1.12.1) the flow files seem to > get cached in memory while the files on S3 are enumerated. Because of this, > we set the Xmx and Xms parameters in bootstrap.conf to 4GB which does not > suffice (we get an exception from AWS at some time). While the collection of > the S3 entries is in progress, all 8 core of the CPUs are utilized at 100% > and the RAM gets eaten up. This is especially bad because Nifi then does not > have the resources to contact its external zookeeper and gets kicked out of > the cluster. Also it is not possible to use the web UI anymore. > This behavior won´t show up if you just have a few objects in S3 because they > can easily be cached in memory but we have millions of entries in our S3 > which will eat up the RAM of the machine. > Maybe it would be a good thing to have an additional parameter for the > processor which sets after how many created flow files they have to be routed > to success. > > If you need any more logfiles I would be happy to provide them! > > BTW: Nifi is great :) Very easy to use and (normally) very economical about > resources. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)