[ 
https://issues.apache.org/jira/browse/NIFI-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608507#comment-14608507
 ] 

Mark Payne commented on NIFI-731:
---------------------------------

I have made some changes to the way that the FlowFileRepository and 
ContentRepository interact with one another. I created some benchmarks to 
compare the results before and after the changes. The changes include a new 
property in the nifi.properties file to configure how often the FlowFile Repo 
performs a 'sync' (and a good bit of documentation added to the Admin Guide 
about what this means).

Benchmark was performed against both my desktop and my laptop. Note that the 
flow used is designed specifically to ensure that this issue arises by creating 
Content Claims that are exactly 1 byte in size, so that massive stress is put 
on deleting tons of files. It is not intended to mimick a typical flow.

After the change
----------------------
FlowFile Repo Settings:
8 partitions
Sync every 100 updates
 
Hardware:
Laptop: 1 drive, 5400 RPM
Desktop: 2 drives, 7200 RPM (1 for Content, 1 for FlowFile)
 
Flow:
GenerateFlowFile -> LogAttribute
GenerateFlowFile set to 1 byte files, batch size of 1, 0 ms run duration. So 1 
byte per Content Claim/File on Disk
LogAttribute set to 'debug' level so it doesn't actually log. 25 ms run 
duration.
 
With Content Repo's archive disabled:
Laptop: 125,000 FlowFiles / 5 min. Warns about backpressure
Desktop: 1.03 million FlowFiles / 5 min. Does not warn about backpressure
 
With archive enabled:
Laptop:  25,000 FlowFIles / 5 min. Warns about backpressure
Desktop: 115,000 FlowFiles / 5 min. Warns about backpressure
 - Changed "Batch Size" property of GenerateFlowFile to 5 FlowFiles per Content 
Claim. Got 435,000 FlowFiles - about 5 times as much, which is what I expected. 
But a good sanity check.


"Baseline" to compare against, before the patch was applied
----------------------------------------------------------------------------
Laptop: Reached 60,000 FlowFiles/5 mins, then saw very long pause as the 
Content Repo destroyed content. FlowFiles per 5 mins dropped from 60K to 30K 
and eventually to under 15K and then back up and down and up and down. Pauses 
were very noticeable in the UI.
Desktop: 481,000 FlowFiles/5 mins, then saw very long pause as the Content Repo 
destroyed content. FlowFiles per 5 mins then dropped and fluctuated similarly 
to Laptop's results.



> If content repo is unable to destroy content as fast as it is generated, nifi 
> performance becomes very sporadic
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-731
>                 URL: https://issues.apache.org/jira/browse/NIFI-731
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.2.0
>
>
> When the FlowFile Repository marks claims as destructable, it puts the 
> notification on a queue that the content repo pulls from. If the content repo 
> cannot keep up, the queue will fill, resulting in backpressure, that prevents 
> the FlowFile repository from being updated. This, in turn, causes Processors 
> to block, waiting on space to become available. This is by design.
> However, the capacity of this queue is quite large, and the content repo 
> drains the entire queue, then destroys all content claims that are on it. As 
> a result, this act of destroying claims can be quite long, and Processors can 
> block for quite a period of time, leading to very sporadic performance.
> Instead, the content repo should pull from the queue and destroy the claims 
> one at a time or in small batches, instead of draining the entire queue each 
> time. This should result in much less sporadic behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to