JackHintonSmartDCSIT commented on PR #8691: URL: https://github.com/apache/nifi/pull/8691#issuecomment-2144852942
> The `SplitRecord` Processors is one example of a Processor that reads an InputStream, consisting of an unknown number of Records, and then writes a segmented number of Records to one or more output FlowFiles. Some of the other Split Processors follow a similar pattern. > > Based on the code as it stands, it looks like it will be necessary to refactor the PCAP reading to emit a stream of packets. The `java.util.Iterator` interface is one general way to model this strategy, but the details depend on consuming bytes from an InputStream, then writing packets to a FlowFile OutputStream, which avoids retaining a large number of objects in memory. I realize this will take some work to refactor, but reviewing some of the existing Split Processors should provide some helpful background examples. I've been looking through the codebase and I can't find any instances where OutputStream is used without either an OutputStreamCallback (and thus is seemingly incompatible with InputStreamCallback as used in SplitRecord) or a RecordWriter (which needs a schema and a controller service). Is writing a controller service required in order to follow this pattern, or is there an example of another approach somewhere I've missed? Also, in SplitRecord the 'splits' array is used to hold the data processed within the InputStreamCallback that splits the original record, but the individual 'split' records aren't returned by the callback (just appended to the 'splits' array whilst the callback is running). Does that mean that the callback itself is synchronous, or does something in OutputStream handle blocking to ensure there aren't issues with race conditions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
