[ 
https://issues.apache.org/jira/browse/NIFI-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-10817:
--------------------------------
    Status: Patch Available  (was: Open)

> Stateless NiFi does not release FlowFile content until flow completes
> ---------------------------------------------------------------------
>
>                 Key: NIFI-10817
>                 URL: https://issues.apache.org/jira/browse/NIFI-10817
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: NiFi Stateless
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a stateless flow is run, the content that is stored in the FlowFile 
> repository is not cleaned up until the flow completes.
> This means that if we have the following flow:
> ConsumeKafka -> ReplaceText -> MergeContent (1000 FlowFile bucket) -> 
> MergeContent (1000 FlowFile bucket) -> PutS3
> The intent here would be to pull data, transform it, merge together many 
> records, and put to s3. The expectation is that we'd have no more than 
> 1,000,000 Kafka messages in the content repo at a time, but we'll have two 
> copies of each (1,000 FlowFiles, each containing 1000 kafka messages, waiting 
> to be merged PLUS the merged result).
> However, what we see is that we have the final merged content, plus the 1,000 
> bundles ahead of it still in the repo (expected), PLUS the 1,000,000 
> individual transformed messages PLUS the original 1,000,000 messages. These 
> intermediate FlowFiles' contents should be purged as aggressively as they can 
> be. This is particularly important when using an in-memory content repository.
> The in-memory content repository does not actually store the content within 
> the repo but rather facilitates a mechanism by which the content can be 
> written to the claim held by the FlowFileRecord. Then, when no longer 
> referenced, we rely on garbage collection to clean up. However, it appears 
> that the ProcessSession is holding on to all of these intermediate claims in 
> its {{records}} member variable, and we can purge those much more 
> aggressively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to