[jira] [Commented] (NIFI-7992) Content Repository can fail to cleanup archive directory fast enough

Joe Witt (Jira) Tue, 10 Nov 2020 11:04:19 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229447#comment-17229447
 ]


Joe Witt commented on NIFI-7992:
--------------------------------

Didn't review the code in detail but did review this writeup and thinking back 
about 8 years ago when I think we last talked about this...the writeup/change 
makes a lot of sense!

> Content Repository can fail to cleanup archive directory fast enough
> --------------------------------------------------------------------
>
>                 Key: NIFI-7992
>                 URL: https://issues.apache.org/jira/browse/NIFI-7992
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Critical
>             Fix For: 1.13.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For the scenario where a use is generating many small FlowFiles and has the 
> "nifi.content.claim.max.appendable.size" property set to a small value, we 
> can encounter a situation where data is constantly archived but not cleaned 
> up quickly enough. As a result, the Content Repository can run out of space.
> The FileSystemRepository has a backpressure mechanism built in to avoid 
> allowing this to happen, but under the above conditions, it can sometimes 
> fail to prevent this situation. The backpressure mechanism works by 
> performing the following steps:
>  # When a new Content Claim is created, the Content Repository determines 
> which 'container' to use.
>  # Content Repository checks if the amount of storage space used for the 
> container is greater than the configured backpressure threshold.
>  # If so, the thread blocks until a background task completes cleanup of the 
> archive directories.
> However, in Step #2 above, it determines if the amount of space currently 
> being used by looking at a cached member variable. That cached member 
> variable is only updated on the first iteration, and when the said background 
> task completes.
> So, now consider a case where there are millions of files in the content 
> repository archive. The background task could take a massive amount of time 
> performing cleanup. Meanwhile, processors are able to write to the repository 
> without any backpressure being applied because the background task hasn't 
> updated the cached variable for the amount of space used. This continues 
> until the content repository fills.
> There are three important very simple things that should be changed:
>  # The background task should be faster in this case. While we cannot improve 
> the amount of time it takes to destroy the files, we do create an ArrayList 
> to hold all of the file info and then use an iterator, calling remove(). 
> Under the hood, this creates a copy of the underlying array for each file 
> that is removed. On my laptop, performing this procedure on an ArrayList with 
> 1 million elements took approximately 1 minute. Changing to a LinkedList took 
> 15 milliseconds but took much more heap. Keeping an ArrayList, then removing 
> all of elements at the end (via ArrayList.subList(0, n).clear()) resulted in 
> similar performance to LinkedList with the memory footprint of ArrayList.
>  # The check to see whether or not the content repository's usage has crossed 
> the threshold should not rely entirely on a cache that is populated by a 
> process that can take a long time. It should periodically calculate the disk 
> usage itself (perhaps once per minute).
>  # When backpressure does get applied, it can appear that the system has 
> frozen up, not performing any sort of work. The background task that is 
> clearing space should periodically log its progress at INFO level to allow 
> users to understand that this action is taking place.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NIFI-7992) Content Repository can fail to cleanup archive directory fast enough

Reply via email to