[
https://issues.apache.org/jira/browse/NIFI-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610193#comment-14610193
]
Mark Payne commented on NIFI-731:
---------------------------------
The patch supplied here provides a few improvements. It allows the user to
synchronize individual partitions of the flowfile repo on regular intervals,
which will allow some content claims to start being archived/desctroyed
immediately. Currently, we wait until the repo is checkpointed and start
destroying all content claims, so this will provide a smoother performance.
Additionally, it allows the user to change the number of partitions used by the
FlowFile Repo. This is done because experimentation shows that 16 partitions is
generally enough and results in much better performance than 256 - so the
default was also changed from 256 to 16.
A better but much more involved solution is to allow the Content Repository to
append to an existing Content Claim, as described in NIFI-744. This will result
in far fewer files to be deleted, and this will very much alleviate this
problem.
> If content repo is unable to destroy content as fast as it is generated, nifi
> performance becomes very sporadic
> ---------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-731
> URL: https://issues.apache.org/jira/browse/NIFI-731
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 0.2.0
>
> Attachments:
> 0001-NIFI-731-Updated-admin-guide-to-explain-the-flowfile.patch
>
>
> When the FlowFile Repository marks claims as destructable, it puts the
> notification on a queue that the content repo pulls from. If the content repo
> cannot keep up, the queue will fill, resulting in backpressure, that prevents
> the FlowFile repository from being updated. This, in turn, causes Processors
> to block, waiting on space to become available. This is by design.
> However, the capacity of this queue is quite large, and the content repo
> drains the entire queue, then destroys all content claims that are on it. As
> a result, this act of destroying claims can be quite long, and Processors can
> block for quite a period of time, leading to very sporadic performance.
> Instead, the content repo should pull from the queue and destroy the claims
> one at a time or in small batches, instead of draining the entire queue each
> time. This should result in much less sporadic behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)