[jira] [Commented] (NIFI-3648) Address Excessive Garbage Collection

ASF GitHub Bot (JIRA) Thu, 30 Mar 2017 10:54:04 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949503#comment-15949503
 ]


ASF GitHub Bot commented on NIFI-3648:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1637#discussion_r108993347
  
    --- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster-protocol/src/main/java/org/apache/nifi/cluster/protocol/impl/SocketProtocolListener.java
 ---
    @@ -134,15 +132,21 @@ public void dispatchRequest(final Socket socket) {
     
                 // unmarshall message
                 final ProtocolMessageUnmarshaller<ProtocolMessage> 
unmarshaller = protocolContext.createUnmarshaller();
    -            final InputStream inStream = socket.getInputStream();
    -            final CopyingInputStream copyingInputStream = new 
CopyingInputStream(inStream, maxMsgBuffer); // don't copy more than 1 MB
    +            final ByteCountingInputStream countingIn = new 
ByteCountingInputStream(socket.getInputStream());
    +            InputStream wrappedInStream = countingIn;
    +            if (logger.isDebugEnabled()) {
    +                final int maxMsgBuffer = 1024 * 1024;   // don't buffer 
more than 1 MB of the message
    +                final CopyingInputStream copyingInputStream = new 
CopyingInputStream(wrappedInStream, maxMsgBuffer);
    +                wrappedInStream = copyingInputStream;
    +            }
     
                 final ProtocolMessage request;
                 try {
    -                request = unmarshaller.unmarshal(copyingInputStream);
    +                request = unmarshaller.unmarshal(wrappedInStream);
                 } finally {
    -                receivedMessage = copyingInputStream.getBytesRead();
                     if (logger.isDebugEnabled()) {
    --- End diff --
    
    We should probably update this to be:
    `if (logger.isDebugEnabled() && wrappedInStream instanceof 
CopyingInputStream) {`
    
    As is, if a user changes logback.xml to set the logger to debug while this 
code is executing, it could result in the first call to logger.isDebugEnabled() 
returning false, in which case wrappedInStream would be a 
ByteCountingInputStream. then, at this point, it checks if 
logger.isDebugEnabled() and now it's true. As a result we try to cast 
ByteCountingInputStream to CopyingInputStream, so it would throw a 
ClassCastException.


> Address Excessive Garbage Collection
> ------------------------------------
>
>                 Key: NIFI-3648
>                 URL: https://issues.apache.org/jira/browse/NIFI-3648
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>
> We have a lot of places in the codebase where we generate lots of unnecessary 
> garbage - especially byte arrays. We need to clean this up to in order to 
> relieve stress on the garbage collector.
> Specific points that I've found create unnecessary garbage:
> Provenance CompressableRecordWriter creates a new BufferedOutputStream for 
> each 'compression block' that it creates. Each one has a 64 KB byte[]. This 
> is very wasteful. We should instead subclass BufferedOutputStream so that we 
> are able to provide a byte[] to use instead of an int that indicates the 
> size. This way, we can just keep re-using the same byte[] that we create for 
> each writer. This saves about 32,000 of these 64 KB byte[] for each writer 
> that we create. And we create more than 1 of these per minute.
> EvaluateJsonPath uses a BufferedInputStream but it is not necessary, because 
> the underlying library will also buffer data. So we are unnecessarily 
> creating a lot of byte[]'s
> CompressContent uses Buffered Input AND Output. And uses 64 KB byte[]. And 
> doesn't need them at all, because it reads and writes with its own byte[] 
> buffer via StreamUtils.copy
> Site-to-site uses CompressionInputStream. This stream creates a new byte[] in 
> the readChunkHeader() method continually. We should instead only create a new 
> byte[] if we need a bigger buffer and otherwise just use an offset & length 
> variable.
> Right now, SplitText uses TextLineDemarcator. The fill() method increases the 
> size of the internal byte[] by 8 KB each time. When dealing with a large 
> chunk of data, this is VERY expensive on GC because we continually create a 
> byte[] and then discard it to create a new one. Take for example an 800 KB 
> chunk. We would do this 100,000 times. If we instead double the size we would 
> only have to create 8 of these.
> Other Processors that use Buffered streams unnecessarily:
> ConvertJSONToSQL
> ExecuteProcess
> ExecuteStreamCommand
> AttributesToJSON
> EvaluateJsonPath (when writing to content)
> ExtractGrok
> JmsConsumer



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3648) Address Excessive Garbage Collection

Reply via email to