[
https://issues.apache.org/jira/browse/NIFI-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-11584:
------------------------------
Status: Patch Available (was: Open)
> MergeContent can be more efficient in terms of disk access
> ----------------------------------------------------------
>
> Key: NIFI-11584
> URL: https://issues.apache.org/jira/browse/NIFI-11584
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework, Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 1.latest, 2.latest
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Long ago (NIFI-516), we updated MergeContent so that when it read from a
> FlowFile, it asked the ProcessSession to not manage the Input Stream and
> instead close the InputStream when finished reading. This was done because if
> we had say 50,000 FlowFiles to merge together, we'd have 50,000
> ProcessSessions. Since the session by default holds open the InputStream
> until the session is committed/rolled back, we would hold open 50,000
> FileInputStreams. This would quickly lead to IOExceptions due to "too many
> open files". So in NIFI-516, we addressed the issue by not holding the stream
> open.
> Then, in NIFI-2850 we made things much more efficient by allowing FlowFiles
> to be moved from 1 ProcessSession to another. So now instead of using 50,000
> Process Sessions, we have a single ProcessSession for the whole bin.
> However, we did not change the behavior of asking ProcessSession not to hold
> open the stream. We can now allow the ProcessSession to manage the
> InputStream as it does elsewhere.
> Additionally, looking at the codebase, MergeContent is the only component
> that uses this feature of the Process Session - and this is a bad practice as
> the ProcessSession.migrate capability makes it unnecessary to ever do this.
> As a result, we should deprecate the {{void read(FlowFile source, boolean
> allowSessionStreamManagement, InputStreamCallback reader) throws
> FlowFileAccessException}} method in 1.x and remove it in 2.0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)