JackHintonSmartDCSIT commented on PR #8691:
URL: https://github.com/apache/nifi/pull/8691#issuecomment-2144852942

   > The `SplitRecord` Processors is one example of a Processor that reads an 
InputStream, consisting of an unknown number of Records, and then writes a 
segmented number of Records to one or more output FlowFiles. Some of the other 
Split Processors follow a similar pattern.
   > 
   > Based on the code as it stands, it looks like it will be necessary to 
refactor the PCAP reading to emit a stream of packets. The `java.util.Iterator` 
interface is one general way to model this strategy, but the details depend on 
consuming bytes from an InputStream, then writing packets to a FlowFile 
OutputStream, which avoids retaining a large number of objects in memory. I 
realize this will take some work to refactor, but reviewing some of the 
existing Split Processors should provide some helpful background examples.
   
   I've been looking through the codebase and I can't find any instances where 
OutputStream is used without either an OutputStreamCallback (and thus is 
seemingly incompatible with InputStreamCallback as used in SplitRecord) or a 
RecordWriter (which needs a schema and a controller service). Is writing a 
controller service required in order to follow this pattern, or is there an 
example of another approach somewhere I've missed?
   
   Also, in SplitRecord the 'splits' array is used to hold the data processed 
within the InputStreamCallback that splits the original record, but the 
individual 'split' records aren't returned by the callback (just appended to 
the 'splits' array whilst the callback is running). Does that mean that the 
callback itself is synchronous, or does something in OutputStream handle 
blocking to ensure there aren't issues with race conditions?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to