[ 
https://issues.apache.org/jira/browse/NIFI-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-11584:
------------------------------
    Status: Patch Available  (was: Open)

> MergeContent can be more efficient in terms of disk access
> ----------------------------------------------------------
>
>                 Key: NIFI-11584
>                 URL: https://issues.apache.org/jira/browse/NIFI-11584
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.latest, 2.latest
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Long ago (NIFI-516), we updated MergeContent so that when it read from a 
> FlowFile, it asked the ProcessSession to not manage the Input Stream and 
> instead close the InputStream when finished reading. This was done because if 
> we had say 50,000 FlowFiles to merge together, we'd have 50,000 
> ProcessSessions. Since the session by default holds open the InputStream 
> until the session is committed/rolled back, we would hold open 50,000 
> FileInputStreams. This would quickly lead to IOExceptions due to "too many 
> open files". So in NIFI-516, we addressed the issue by not holding the stream 
> open.
> Then, in NIFI-2850 we made things much more efficient by allowing FlowFiles 
> to be moved from 1 ProcessSession to another. So now instead of using 50,000 
> Process Sessions, we have a single ProcessSession for the whole bin.
> However, we did not change the behavior of asking ProcessSession not to hold 
> open the stream. We can now allow the ProcessSession to manage the 
> InputStream as it does elsewhere.
> Additionally, looking at the codebase, MergeContent is the only component 
> that uses this feature of the Process Session - and this is a bad practice as 
> the ProcessSession.migrate capability makes it unnecessary to ever do this. 
> As a result, we should deprecate the {{void read(FlowFile source, boolean 
> allowSessionStreamManagement, InputStreamCallback reader) throws 
> FlowFileAccessException}} method in 1.x and remove it in 2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to