[ https://issues.apache.org/jira/browse/NIFI-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Bende updated NIFI-7992: ------------------------------ Resolution: Fixed Status: Resolved (was: Patch Available) > Content Repository can fail to cleanup archive directory fast enough > -------------------------------------------------------------------- > > Key: NIFI-7992 > URL: https://issues.apache.org/jira/browse/NIFI-7992 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Critical > Fix For: 1.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > For the scenario where a use is generating many small FlowFiles and has the > "nifi.content.claim.max.appendable.size" property set to a small value, we > can encounter a situation where data is constantly archived but not cleaned > up quickly enough. As a result, the Content Repository can run out of space. > The FileSystemRepository has a backpressure mechanism built in to avoid > allowing this to happen, but under the above conditions, it can sometimes > fail to prevent this situation. The backpressure mechanism works by > performing the following steps: > # When a new Content Claim is created, the Content Repository determines > which 'container' to use. > # Content Repository checks if the amount of storage space used for the > container is greater than the configured backpressure threshold. > # If so, the thread blocks until a background task completes cleanup of the > archive directories. > However, in Step #2 above, it determines if the amount of space currently > being used by looking at a cached member variable. That cached member > variable is only updated on the first iteration, and when the said background > task completes. > So, now consider a case where there are millions of files in the content > repository archive. The background task could take a massive amount of time > performing cleanup. Meanwhile, processors are able to write to the repository > without any backpressure being applied because the background task hasn't > updated the cached variable for the amount of space used. This continues > until the content repository fills. > There are three important very simple things that should be changed: > # The background task should be faster in this case. While we cannot improve > the amount of time it takes to destroy the files, we do create an ArrayList > to hold all of the file info and then use an iterator, calling remove(). > Under the hood, this creates a copy of the underlying array for each file > that is removed. On my laptop, performing this procedure on an ArrayList with > 1 million elements took approximately 1 minute. Changing to a LinkedList took > 15 milliseconds but took much more heap. Keeping an ArrayList, then removing > all of elements at the end (via ArrayList.subList(0, n).clear()) resulted in > similar performance to LinkedList with the memory footprint of ArrayList. > # The check to see whether or not the content repository's usage has crossed > the threshold should not rely entirely on a cache that is populated by a > process that can take a long time. It should periodically calculate the disk > usage itself (perhaps once per minute). > # When backpressure does get applied, it can appear that the system has > frozen up, not performing any sort of work. The background task that is > clearing space should periodically log its progress at INFO level to allow > users to understand that this action is taking place. > -- This message was sent by Atlassian Jira (v8.3.4#803005)