Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2425#discussion_r168057832
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateRecord.java
 ---
    @@ -242,11 +279,12 @@ public void onTrigger(final ProcessContext context, 
final ProcessSession session
             final boolean allowExtraFields = 
context.getProperty(ALLOW_EXTRA_FIELDS).asBoolean();
             final boolean strictTypeChecking = 
context.getProperty(STRICT_TYPE_CHECKING).asBoolean();
     
    -        RecordSetWriter validWriter = null;
    -        RecordSetWriter invalidWriter = null;
             FlowFile validFlowFile = null;
             FlowFile invalidFlowFile = null;
     
    +        final List<Record> validRecords = new LinkedList<>();
    --- End diff --
    
    Hi @martin-mucha 
    Let me try to answer your question. @markap14 will correct me if I'm wrong 
:)
    
    
[ValidateRecord.completeFlowFile](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateRecord.java#L408)
 method calls `writer.finishRecordSet()`, which let the writer to write the 
ending mark of record set, as some record format requires this, e.g. JSON '}' 
or XML '</root>' would be easy to imagine. Actual bytes for record contents had 
been written in advance.
    
    I'd recommend reading [NiFi in depth, Content 
Repository](https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#content-repository)
 on how NiFi reads/writes FlowFile content in streaming manner without loading 
whole content on heap.
    
    If you're interested in reading code, 
[StandardProcessSession.write](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L2433)
 might be a good starting point for how FlowFile and its OutputStream is 
created. 
    
    And the OutputStream is passed to RecordSetWriter implementations. For 
example, when a processor writes a record, then it is sent to a method of a 
configured RecordSetWriter like this, 
    
[WriteCSVResult.writeRecord](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/WriteCSVResult.java#L147).
    
    These RecordSetWriter does not hold contents on heap. They write records in 
streaming manner.
    
    If we create a List and hold `Record` instances, then we keep content on 
heap as `Record` instances which can lead to a OOM.
    
    Hope this helps!


---

Reply via email to