[
https://issues.apache.org/jira/browse/NIFI-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322494#comment-16322494
]
ASF GitHub Bot commented on NIFI-3648:
--------------------------------------
Github user markap14 commented on the issue:
https://github.com/apache/nifi/pull/1637
@mosermw sorry about the delay. The changes do look good... it looks like
I'd reviewed on Mar. 30 and then must have forgotten it. But was able to
cleanly rebase against master and verify that all looks good. Definitely a nice
improvement. +1 merged to master.
> Address Excessive Garbage Collection
> ------------------------------------
>
> Key: NIFI-3648
> URL: https://issues.apache.org/jira/browse/NIFI-3648
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework, Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 1.6.0
>
>
> We have a lot of places in the codebase where we generate lots of unnecessary
> garbage - especially byte arrays. We need to clean this up to in order to
> relieve stress on the garbage collector.
> Specific points that I've found create unnecessary garbage:
> Provenance CompressableRecordWriter creates a new BufferedOutputStream for
> each 'compression block' that it creates. Each one has a 64 KB byte[]. This
> is very wasteful. We should instead subclass BufferedOutputStream so that we
> are able to provide a byte[] to use instead of an int that indicates the
> size. This way, we can just keep re-using the same byte[] that we create for
> each writer. This saves about 32,000 of these 64 KB byte[] for each writer
> that we create. And we create more than 1 of these per minute.
> EvaluateJsonPath uses a BufferedInputStream but it is not necessary, because
> the underlying library will also buffer data. So we are unnecessarily
> creating a lot of byte[]'s
> CompressContent uses Buffered Input AND Output. And uses 64 KB byte[]. And
> doesn't need them at all, because it reads and writes with its own byte[]
> buffer via StreamUtils.copy
> Site-to-site uses CompressionInputStream. This stream creates a new byte[] in
> the readChunkHeader() method continually. We should instead only create a new
> byte[] if we need a bigger buffer and otherwise just use an offset & length
> variable.
> Right now, SplitText uses TextLineDemarcator. The fill() method increases the
> size of the internal byte[] by 8 KB each time. When dealing with a large
> chunk of data, this is VERY expensive on GC because we continually create a
> byte[] and then discard it to create a new one. Take for example an 800 KB
> chunk. We would do this 100,000 times. If we instead double the size we would
> only have to create 8 of these.
> Other Processors that use Buffered streams unnecessarily:
> ConvertJSONToSQL
> ExecuteProcess
> ExecuteStreamCommand
> AttributesToJSON
> EvaluateJsonPath (when writing to content)
> ExtractGrok
> JmsConsumer
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)