JackHintonSmartDCSIT commented on PR #8691:
URL: https://github.com/apache/nifi/pull/8691#issuecomment-2135241225

   > > > Thanks for the continued work on this Processor @JackHintonSmartDCSIT, 
and thanks for the thorough review thus far @dan-s1.
   > > > Reviewing the latest version, there is a serious concern with reading 
the entire input FlowFile content into memory.
   > > > I have not evaluated all the details of the `PCAP` class, but it looks 
like it should be possible to read packets in a streaming fashion, which would 
avoid significant memory consumption. Although the current approach could work 
for small files, this needs to be refactored as it will not scale for larger 
files.
   > > 
   > > 
   > > The assumption used in the creation of this processor was that the 
flowfile to be split would already be present in NiFi in order to be processed, 
which (one would assume) necessitates that the PCAP file is already present in 
memory in its' entirety. I don't disagree with your assessment, but I would 
like to clarify; is there a way of streaming flowfiles between processors in 
such a way that they aren't required to be held entirely in memory? If so, can 
you point me to a processor that uses this mechanism or some documentation of 
it so I can adopt it here? Thanks!
   > 
   > Thanks for the reply @JackHintonSmartDCSIT.
   > 
   > FlowFile content in a NiFi queue does not use heap memory, and instead 
uses a persistent repository. This allows NiFi to process very large files with 
relatively small amounts of memory in many cases. The vast majority of NiFi 
processors do not read the entire FlowFile content into memory, and those that 
do are more exceptional cases.
   > 
   > The `SplitRecord` Processors is one example of a Processor that reads an 
InputStream, consisting of an unknown number of Records, and then writes a 
segmented number of Records to one or more output FlowFiles. Some of the other 
Split Processors follow a similar pattern.
   > 
   > Based on the code as it stands, it looks like it will be necessary to 
refactor the PCAP reading to emit a stream of packets. The `java.util.Iterator` 
interface is one general way to model this strategy, but the details depend on 
consuming bytes from an InputStream, then writing packets to a FlowFile 
OutputStream, which avoids retaining a large number of objects in memory. I 
realize this will take some work to refactor, but reviewing some of the 
existing Split Processors should provide some helpful background examples.
   
   Thanks for your response! Out of curiosity, is there a way to stream large 
files into NiFi and manipulate them as they're streamed in?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to