[ 
https://issues.apache.org/jira/browse/NIFI-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-7874.
----------------------------------
    Fix Version/s:     (was: 1.11.4)
                   1.13.0
         Assignee: Mark Payne
       Resolution: Fixed

> S3List processor in v1.12.1 uses lots of CPU power and RAM
> ----------------------------------------------------------
>
>                 Key: NIFI-7874
>                 URL: https://issues.apache.org/jira/browse/NIFI-7874
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.12.1
>         Environment: Centos 7, Amazon Cloud, 8 CPU cores, 64 GB RAM
>            Reporter: Dominik Dresel
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.13.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are using the S3List processor to collect our log data from S3 and process 
> them further. In Nifi version 1.11.4 the plugin reads a log file from S3, 
> creates a flow file out of it, routes it to success and repeats its loop from 
> the beginning. This is fast and does not need a lot of resources. We can 
> operate Nifi at the default 512 MB RAM with 8 CPU cores which are utilized 
> roughly at 50%.
> With the new version of the S3List processor (v1.12.1) the flow files seem to 
> get cached in memory while the files on S3 are enumerated. Because of this, 
> we set the Xmx and Xms parameters in bootstrap.conf to 4GB which does not 
> suffice (we get an exception from AWS at some time). While the collection of 
> the S3 entries is in progress, all 8 core of the CPUs are utilized at 100% 
> and the RAM gets eaten up. This is especially bad because Nifi then does not 
> have the resources to contact its external zookeeper and gets kicked out of 
> the cluster. Also it is not possible to use the web UI anymore.
> This behavior won´t show up if you just have a few objects in S3 because they 
> can easily be cached in memory but we have millions of entries in our S3 
> which will eat up the RAM of the machine.
> Maybe it would be a good thing to have an additional parameter for the 
> processor which sets after how many created flow files they have to be routed 
> to success.
>  
> If you need any more logfiles I would be happy to provide them!
>  
> BTW: Nifi is great :) Very easy to use and (normally) very economical about 
> resources.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to