Github user ijokarumawak commented on a diff in the pull request:
    --- Diff: 
    @@ -242,11 +279,12 @@ public void onTrigger(final ProcessContext context, 
final ProcessSession session
             final boolean allowExtraFields = 
             final boolean strictTypeChecking = 
    -        RecordSetWriter validWriter = null;
    -        RecordSetWriter invalidWriter = null;
             FlowFile validFlowFile = null;
             FlowFile invalidFlowFile = null;
    +        final List<Record> validRecords = new LinkedList<>();
    --- End diff --
    Hi @martin-mucha 
    Let me try to answer your question. @markap14 will correct me if I'm wrong 
 method calls `writer.finishRecordSet()`, which let the writer to write the 
ending mark of record set, as some record format requires this, e.g. JSON '}' 
or XML '</root>' would be easy to imagine. Actual bytes for record contents had 
been written in advance.
    I'd recommend reading [NiFi in depth, Content 
 on how NiFi reads/writes FlowFile content in streaming manner without loading 
whole content on heap.
    If you're interested in reading code, 
 might be a good starting point for how FlowFile and its OutputStream is 
    And the OutputStream is passed to RecordSetWriter implementations. For 
example, when a processor writes a record, then it is sent to a method of a 
configured RecordSetWriter like this, 
    These RecordSetWriter does not hold contents on heap. They write records in 
streaming manner.
    If we create a List and hold `Record` instances, then we keep content on 
heap as `Record` instances which can lead to a OOM.
    Hope this helps!


Reply via email to