Mark Payne created NIFI-11584:
---------------------------------

             Summary: MergeContent can be more efficient in terms of disk access
                 Key: NIFI-11584
                 URL: https://issues.apache.org/jira/browse/NIFI-11584
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework, Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne
             Fix For: 1.latest, 2.latest


Long ago (NIFI-516), we updated MergeContent so that when it read from a 
FlowFile, it asked the ProcessSession to not manage the Input Stream and 
instead close the InputStream when finished reading. This was done because if 
we had say 50,000 FlowFiles to merge together, we'd have 50,000 
ProcessSessions. Since the session by default holds open the InputStream until 
the session is committed/rolled back, we would hold open 50,000 
FileInputStreams. This would quickly lead to IOExceptions due to "too many open 
files". So in NIFI-516, we addressed the issue by not holding the stream open.

Then, in NIFI-2850 we made things much more efficient by allowing FlowFiles to 
be moved from 1 ProcessSession to another. So now instead of using 50,000 
Process Sessions, we have a single ProcessSession for the whole bin.

However, we did not change the behavior of asking ProcessSession not to hold 
open the stream. We can now allow the ProcessSession to manage the InputStream 
as it does elsewhere.

Additionally, looking at the codebase, MergeContent is the only component that 
uses this feature of the Process Session - and this is a bad practice as the 
ProcessSession.migrate capability makes it unnecessary to ever do this. As a 
result, we should deprecate the {{void read(FlowFile source, boolean 
allowSessionStreamManagement, InputStreamCallback reader) throws 
FlowFileAccessException}} method in 1.x and remove it in 2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to