[
https://issues.apache.org/jira/browse/NIFI-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099102#comment-16099102
]
Joseph Witt commented on NIFI-3376:
-----------------------------------
I think you do have a group of people that want to help find a resolution. It
might make sense to retitle this JIRA as 'Content repository can be too full
with large content claims' is better as it focuses on the problem to solve
rather than presumes a path to resolution (compaction).
What was considered unfavorably is the initial compaction based proposal. This
is not because it is hard or because compaction is a bad idea in general. But
rather because the current design of the repositories does not lend itself to
doing compaction while achieving the other performance goals.
So, alternatively what has been offered is the following:
# Lower the max appendable size. This is a mitigating approach and is available
now thanks to the bug fix for it.
# Add monitoring/logging in so we can actually observed/understand the case
better which could feed into a more data driven discussion.
# [latest] Consider some max size for any single written content so that it
would live on its own.
The current default max size is frankly pretty arbitrary and not backed by much
real data. I suspect the use cases you're finding it problematic for could
achieve similar performance and stability without the incidence of unpurgable
claims. We should try #1 for your case. We should have done #2 so we can
better monitor it. Doing #3 could help but might not even be necessary if we
simply move to a better more data driven default value and document the
tradeoff otherwise.
Again, there is no question that this is an important and interesting topic to
come to resolution on.
> Implement content repository ResourceClaim compaction
> -----------------------------------------------------
>
> Key: NIFI-3376
> URL: https://issues.apache.org/jira/browse/NIFI-3376
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 0.7.1, 1.1.1
> Reporter: Michael Moser
> Assignee: Michael Hogue
>
> On NiFi systems that deal with many files whose size is less than 1 MB, we
> often see that the actual disk usage of the content_repository is much
> greater than the size of flowfiles that NiFi reports are in its queues. As
> an example, NiFi may report "50,000 / 12.5 GB" but the content_repository
> takes up 240 GB of its file system. This leads to scenarios where a 500 GB
> content_repository file system gets 100% full, but "I only had 40 GB of data
> in my NiFi!"
> When several content claims exist in a single resource claim, and most but
> not all content claims are terminated, the entire resource claim is still not
> eligible for deletion or archive. This could mean that only one 10 KB
> content claim out of a 1 MB resource claim is counted by NiFi as existing in
> its queues.
> If a particular flow has a slow egress point where flowfiles could back up
> and remain on the system longer than expected, this problem is exacerbated.
> A potential solution is to compact resource claim files on disk. A background
> thread could examine all resource claims, and for those that get "old" and
> whose active content claim usage drops below a threshold, then rewrite the
> resource claim file.
> A potential work-around is to allow modification of the FileSystemRepository
> MAX_APPENDABLE_CLAIM_LENGTH to make it a smaller number. This would increase
> the probability that the content claims reference count in a resource claim
> would reach 0 and the resource claim becomes eligible for deletion/archive.
> Let users trade-off performance for more accurate accounting of NiFi queue
> size to content repository size.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)