Mark Payne created NIFI-4794:
--------------------------------
Summary: Improve Garbage Collection required by Provenance
Repository
Key: NIFI-4794
URL: https://issues.apache.org/jira/browse/NIFI-4794
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
The EventIdFirstSchemaRecordWriter that is used by the provenance repository
has a writeRecord(ProvenanceEventRecord) method. Within this method, it
serializes the given record into a byte array by serializing to a
ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once
this is done, it calls toByteArray() on that BAOS so that it can write the
byte[] directly to another OutputStream.
This can create a rather large amount of garbage to be collected. We can
improve this significantly:
# Instead of creating a new ByteArrayOutputStream each time, create a pool of
them. This avoids constantly having to garbage collect them.
# If said BAOS grows beyond a certain size, we should not return it to the
pool because we don't want to keep a huge impact on the heap.
# Instead of wrapping the BAOS in a new DataOutputStream, the DataOutputStream
should be pooled/recycled as well. Since it must create an internal byte[] for
the writeUTF method, this can save a significant amount of garbage.
# Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use
ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that
new array/copying the data, and the GC overhead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)