Mark Payne created NIFI-11584:
---------------------------------
Summary: MergeContent can be more efficient in terms of disk access
Key: NIFI-11584
URL: https://issues.apache.org/jira/browse/NIFI-11584
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework, Extensions
Reporter: Mark Payne
Assignee: Mark Payne
Fix For: 1.latest, 2.latest
Long ago (NIFI-516), we updated MergeContent so that when it read from a
FlowFile, it asked the ProcessSession to not manage the Input Stream and
instead close the InputStream when finished reading. This was done because if
we had say 50,000 FlowFiles to merge together, we'd have 50,000
ProcessSessions. Since the session by default holds open the InputStream until
the session is committed/rolled back, we would hold open 50,000
FileInputStreams. This would quickly lead to IOExceptions due to "too many open
files". So in NIFI-516, we addressed the issue by not holding the stream open.
Then, in NIFI-2850 we made things much more efficient by allowing FlowFiles to
be moved from 1 ProcessSession to another. So now instead of using 50,000
Process Sessions, we have a single ProcessSession for the whole bin.
However, we did not change the behavior of asking ProcessSession not to hold
open the stream. We can now allow the ProcessSession to manage the InputStream
as it does elsewhere.
Additionally, looking at the codebase, MergeContent is the only component that
uses this feature of the Process Session - and this is a bad practice as the
ProcessSession.migrate capability makes it unnecessary to ever do this. As a
result, we should deprecate the {{void read(FlowFile source, boolean
allowSessionStreamManagement, InputStreamCallback reader) throws
FlowFileAccessException}} method in 1.x and remove it in 2.0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)