Mark Payne created NIFI-4794:
--------------------------------

             Summary: Improve Garbage Collection required by Provenance 
Repository
                 Key: NIFI-4794
                 URL: https://issues.apache.org/jira/browse/NIFI-4794
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne


The EventIdFirstSchemaRecordWriter that is used by the provenance repository 
has a writeRecord(ProvenanceEventRecord) method. Within this method, it 
serializes the given record into a byte array by serializing to a 
ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once 
this is done, it calls toByteArray() on that BAOS so that it can write the 
byte[] directly to another OutputStream.

This can create a rather large amount of garbage to be collected. We can 
improve this significantly:
 # Instead of creating a new ByteArrayOutputStream each time, create a pool of 
them. This avoids constantly having to garbage collect them.
 # If said BAOS grows beyond a certain size, we should not return it to the 
pool because we don't want to keep a huge impact on the heap.
 # Instead of wrapping the BAOS in a new DataOutputStream, the DataOutputStream 
should be pooled/recycled as well. Since it must create an internal byte[] for 
the writeUTF method, this can save a significant amount of garbage.
 # Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use 
ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that 
new array/copying the data, and the GC overhead.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to