[
https://issues.apache.org/jira/browse/NIFI-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-5466:
-----------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Improve efficiency of RingBufferEventRepository
> -----------------------------------------------
>
> Key: NIFI-5466
> URL: https://issues.apache.org/jira/browse/NIFI-5466
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 1.8.0
>
>
> The implementation of the FlowFileEventRepository (i.e., the
> RingBufferEventRepository) is implemented in such a way that 300 buckets are
> kept for each component. Each bucket holds a set of 16+ stats (bytes read,
> bytes written, flowfiles in, flowfiles out, etc. as well as counters)
> Each time that a user refreshes stats, we calculate the 5-minute window for
> each component by iterating over all 300 buckets, then summing together the
> values of the stats. So, for every 1,000 processors, we have 1000 * 300 * 16
> = 4.8 MMÂ calculations being performed each time that the user refreshes.
> We can improve this significantly. To do this, we can keep a 'running tally'
> for each stat. Every time that we update a bucket in the
> FlowFileEventRepository, we should similarly update the running tally. Each
> time that we 'replace' a bucket in the FlowFileEventRepository, we should
> subtract from the running tally the values in the bucket, in addition to
> adding in the new value. This would result in amortizing the cost of
> calculation over the 5-minute period and means that every time that we get
> the processor stats we could do so by simply performing a 'get' operation
> without any calculations.
> But wait, there's more! The EventSumValue class has an {{add(FlowFileEvent)}}
> method. If the EventSumValue were to simply keep a boolean value indicating
> whether or not it was empty, then when adding to an empty bucket we could
> avoid the calculations all together, and when replacing an empty bucket we
> could avoid the calculation against the running tally as well.
> Similarly, in the VolatileComponentStatusRepository, when we perform a
> 'capture' we should detect any 'empty' stats objects and use a single static
> instance for this, which would significantly reduce the amount of heap that
> is used to store metrics.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)