Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2425#discussion_r168057832
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateRecord.java
---
@@ -242,11 +279,12 @@ public void onTrigger(final ProcessContext context,
final ProcessSession session
final boolean allowExtraFields =
context.getProperty(ALLOW_EXTRA_FIELDS).asBoolean();
final boolean strictTypeChecking =
context.getProperty(STRICT_TYPE_CHECKING).asBoolean();
- RecordSetWriter validWriter = null;
- RecordSetWriter invalidWriter = null;
FlowFile validFlowFile = null;
FlowFile invalidFlowFile = null;
+ final List<Record> validRecords = new LinkedList<>();
--- End diff --
Hi @martin-mucha
Let me try to answer your question. @markap14 will correct me if I'm wrong
:)
[ValidateRecord.completeFlowFile](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateRecord.java#L408)
method calls `writer.finishRecordSet()`, which let the writer to write the
ending mark of record set, as some record format requires this, e.g. JSON '}'
or XML '</root>' would be easy to imagine. Actual bytes for record contents had
been written in advance.
I'd recommend reading [NiFi in depth, Content
Repository](https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#content-repository)
on how NiFi reads/writes FlowFile content in streaming manner without loading
whole content on heap.
If you're interested in reading code,
[StandardProcessSession.write](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L2433)
might be a good starting point for how FlowFile and its OutputStream is
created.
And the OutputStream is passed to RecordSetWriter implementations. For
example, when a processor writes a record, then it is sent to a method of a
configured RecordSetWriter like this,
[WriteCSVResult.writeRecord](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/WriteCSVResult.java#L147).
These RecordSetWriter does not hold contents on heap. They write records in
streaming manner.
If we create a List and hold `Record` instances, then we keep content on
heap as `Record` instances which can lead to a OOM.
Hope this helps!
---