[ 
https://issues.apache.org/jira/browse/NIFI-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483946#comment-16483946
 ] 

ASF GitHub Bot commented on NIFI-5225:
--------------------------------------

GitHub user FrederikP opened a pull request:

    https://github.com/apache/nifi/pull/2732

    NIFI-5225: Purge event data from event repository when Connectable is 
removed

    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
    _Clean install ran through just fine, but contrib-check complained about an 
unrelated package_
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] ~~If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?~~ 
    - [ ] ~~If applicable, have you updated the LICENSE file, including the 
main LICENSE file under nifi-assembly?~~
    - [ ] ~~If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?~~
    - [ ] ~~If adding new Properties, have you added .displayName in addition 
to .name (programmatic access) for each of the new properties?~~
    
    ### For documentation related changes:
    - ~~[ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?~~
    
    I introduced the option to purge data from the FlowFileEventRepository (the 
5 min ring buffer) to fix this:
    https://issues.apache.org/jira/browse/NIFI-5225
    
    And it works for our setup.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/FrederikP/nifi master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2732
    
----
commit 4e5a118305c9513cca239c136c48239c501e9907
Author: Frederik Petersen <fp@...>
Date:   2018-05-22T10:55:59Z

    NIFI-5225: Purge event data from event repository when Connectable is 
removed

----


> Leak in RingBufferEventRepository for frequently updated flows
> --------------------------------------------------------------
>
>                 Key: NIFI-5225
>                 URL: https://issues.apache.org/jira/browse/NIFI-5225
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.5.0, 1.6.0
>         Environment: HDF-3.1.0.0
>            Reporter: Frederik Petersen
>            Priority: Major
>              Labels: performance
>
> We use NiFi's API to change a part of our flow quite frequently. Over the 
> past weeks we have noticed that the performance of web requests degrades over 
> time and had a very hard time to find out why.
> Today I took a closer look. When using visualvm to sample cpu it already 
> stood out that the longer the cluster was running, the more time was spent in 
> 'SecondPrecisionEventContainer.generateReport()' during web requests. This 
> method is already relied on a lot right after starting the cluster (for big 
> flows and process groups). But the time spent in it increases (in our setup) 
> the longer the cluster runs. This increases latency of almost every web 
> request. Our flow reconfiguration script (calling many NiFi API endpoints) 
> went from 2 minutes to 20 minutes run time in a few days.
>  Looking at the source code I couldn't quite figure out why the run time 
> should increase over time, because the ring buffers always stay the same size 
> (301 entries|5 minutes).
> When sampling memory I noticed quite a lot of EventSum instances, more than 
> there should have been. So I took a heap dump and ran a MemoryAnalyzer tool. 
> The "Leak Suspects" overview gave me the final hint to what was wrong.
>  It reported:
> One instance of "java.util.concurrent.ConcurrentHashMap$Node[]" loaded by 
> "<system class loader>" occupies 5,649,926,328 (55.74%) bytes. The instance 
> is referenced by 
> org.apache.nifi.controller.repository.metrics.RingBufferEventRepository @ 
> 0x7f86c50cda40 , loaded by "org.apache.nifi.nar.NarClassLoader @ 
> 0x7f86a0000000". The memory is accumulated in one instance of 
> "java.util.concurrent.ConcurrentHashMap$Node[]" loaded by "<system class 
> loader>".
> The issue is:
> When we remove processors, connections, process groups from the flow, their 
> data is not removed from the ConcurrentHashMap in RingBufferEventRepository. 
> There is a 'purgeTransferEvents' but it only calls an empty 'purgeEvents' 
> method on all 'SecondPrecisionEventContainer's in the map.
> This means that the map grows without bounds and every time 
> 'reportTransferEvents' is called it iterates over all (meaning more and more 
> over time) entries of the map. This increases latency of every web request 
> and also a huge amount of memory occupied.
> A rough idea to fix this:
> Remove the entry for each removed component (processor, process group, 
> connection, ?...) using their onRemoved Methods in the FlowController
> This should stop the map from growing infinitely for any flow where removals 
> of any components happens frequently. Especially when automated.
> Since this is quite urgent for us, I'll try to work on a fix for this and 
> provide a pull request if successful.
> Since no-one noticed this before, I guess we are not the typical user of 
> NiFi, as we thought it was possible to heavily reconfigure flows using the 
> API, but with this performance issue, it's not.
> Please let me know if I can provide any more helpful detail for this problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to