[
https://issues.apache.org/jira/browse/NIFI-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Burgess updated NIFI-10817:
--------------------------------
Status: Patch Available (was: Open)
> Stateless NiFi does not release FlowFile content until flow completes
> ---------------------------------------------------------------------
>
> Key: NIFI-10817
> URL: https://issues.apache.org/jira/browse/NIFI-10817
> Project: Apache NiFi
> Issue Type: Bug
> Components: NiFi Stateless
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> When a stateless flow is run, the content that is stored in the FlowFile
> repository is not cleaned up until the flow completes.
> This means that if we have the following flow:
> ConsumeKafka -> ReplaceText -> MergeContent (1000 FlowFile bucket) ->
> MergeContent (1000 FlowFile bucket) -> PutS3
> The intent here would be to pull data, transform it, merge together many
> records, and put to s3. The expectation is that we'd have no more than
> 1,000,000 Kafka messages in the content repo at a time, but we'll have two
> copies of each (1,000 FlowFiles, each containing 1000 kafka messages, waiting
> to be merged PLUS the merged result).
> However, what we see is that we have the final merged content, plus the 1,000
> bundles ahead of it still in the repo (expected), PLUS the 1,000,000
> individual transformed messages PLUS the original 1,000,000 messages. These
> intermediate FlowFiles' contents should be purged as aggressively as they can
> be. This is particularly important when using an in-memory content repository.
> The in-memory content repository does not actually store the content within
> the repo but rather facilitates a mechanism by which the content can be
> written to the claim held by the FlowFileRecord. Then, when no longer
> referenced, we rely on garbage collection to clean up. However, it appears
> that the ProcessSession is holding on to all of these intermediate claims in
> its {{records}} member variable, and we can purge those much more
> aggressively.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)