[ 
https://issues.apache.org/jira/browse/NIFI-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-5466:
-----------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Improve efficiency of RingBufferEventRepository
> -----------------------------------------------
>
>                 Key: NIFI-5466
>                 URL: https://issues.apache.org/jira/browse/NIFI-5466
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.8.0
>
>
> The implementation of the FlowFileEventRepository (i.e., the 
> RingBufferEventRepository) is implemented in such a way that 300 buckets are 
> kept for each component. Each bucket holds a set of 16+ stats (bytes read, 
> bytes written, flowfiles in, flowfiles out, etc. as well as counters)
> Each time that a user refreshes stats, we calculate the 5-minute window for 
> each component by iterating over all 300 buckets, then summing together the 
> values of the stats. So, for every 1,000 processors, we have 1000 * 300 * 16 
> = 4.8 MM calculations being performed each time that the user refreshes.
> We can improve this significantly. To do this, we can keep a 'running tally' 
> for each stat. Every time that we update a bucket in the 
> FlowFileEventRepository, we should similarly update the running tally. Each 
> time that we 'replace' a bucket in the FlowFileEventRepository, we should 
> subtract from the running tally the values in the bucket, in addition to 
> adding in the new value. This would result in amortizing the cost of 
> calculation over the 5-minute period and means that every time that we get 
> the processor stats we could do so by simply performing a 'get' operation 
> without any calculations.
> But wait, there's more! The EventSumValue class has an {{add(FlowFileEvent)}} 
> method. If the EventSumValue were to simply keep a boolean value indicating 
> whether or not it was empty, then when adding to an empty bucket we could 
> avoid the calculations all together, and when replacing an empty bucket we 
> could avoid the calculation against the running tally as well.
> Similarly, in the VolatileComponentStatusRepository, when we perform a 
> 'capture' we should detect any 'empty' stats objects and use a single static 
> instance for this, which would significantly reduce the amount of heap that 
> is used to store metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to