[
https://issues.apache.org/jira/browse/NIFI-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949677#comment-15949677
]
ASF GitHub Bot commented on NIFI-3648:
--------------------------------------
Github user mosermw commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1637#discussion_r109018746
--- Diff:
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster-protocol/src/main/java/org/apache/nifi/cluster/protocol/impl/SocketProtocolListener.java
---
@@ -134,15 +132,21 @@ public void dispatchRequest(final Socket socket) {
// unmarshall message
final ProtocolMessageUnmarshaller<ProtocolMessage>
unmarshaller = protocolContext.createUnmarshaller();
- final InputStream inStream = socket.getInputStream();
- final CopyingInputStream copyingInputStream = new
CopyingInputStream(inStream, maxMsgBuffer); // don't copy more than 1 MB
+ final ByteCountingInputStream countingIn = new
ByteCountingInputStream(socket.getInputStream());
+ InputStream wrappedInStream = countingIn;
+ if (logger.isDebugEnabled()) {
+ final int maxMsgBuffer = 1024 * 1024; // don't buffer
more than 1 MB of the message
+ final CopyingInputStream copyingInputStream = new
CopyingInputStream(wrappedInStream, maxMsgBuffer);
+ wrappedInStream = copyingInputStream;
+ }
final ProtocolMessage request;
try {
- request = unmarshaller.unmarshal(copyingInputStream);
+ request = unmarshaller.unmarshal(wrappedInStream);
} finally {
- receivedMessage = copyingInputStream.getBytesRead();
if (logger.isDebugEnabled()) {
--- End diff --
Excellent! I thought of using instanceof CopyingInputStream but didn't
think it would help. The potential race condition makes it useful and
necessary. I pushed a fix, will squash if needed.
> Address Excessive Garbage Collection
> ------------------------------------
>
> Key: NIFI-3648
> URL: https://issues.apache.org/jira/browse/NIFI-3648
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework, Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
>
> We have a lot of places in the codebase where we generate lots of unnecessary
> garbage - especially byte arrays. We need to clean this up to in order to
> relieve stress on the garbage collector.
> Specific points that I've found create unnecessary garbage:
> Provenance CompressableRecordWriter creates a new BufferedOutputStream for
> each 'compression block' that it creates. Each one has a 64 KB byte[]. This
> is very wasteful. We should instead subclass BufferedOutputStream so that we
> are able to provide a byte[] to use instead of an int that indicates the
> size. This way, we can just keep re-using the same byte[] that we create for
> each writer. This saves about 32,000 of these 64 KB byte[] for each writer
> that we create. And we create more than 1 of these per minute.
> EvaluateJsonPath uses a BufferedInputStream but it is not necessary, because
> the underlying library will also buffer data. So we are unnecessarily
> creating a lot of byte[]'s
> CompressContent uses Buffered Input AND Output. And uses 64 KB byte[]. And
> doesn't need them at all, because it reads and writes with its own byte[]
> buffer via StreamUtils.copy
> Site-to-site uses CompressionInputStream. This stream creates a new byte[] in
> the readChunkHeader() method continually. We should instead only create a new
> byte[] if we need a bigger buffer and otherwise just use an offset & length
> variable.
> Right now, SplitText uses TextLineDemarcator. The fill() method increases the
> size of the internal byte[] by 8 KB each time. When dealing with a large
> chunk of data, this is VERY expensive on GC because we continually create a
> byte[] and then discard it to create a new one. Take for example an 800 KB
> chunk. We would do this 100,000 times. If we instead double the size we would
> only have to create 8 of these.
> Other Processors that use Buffered streams unnecessarily:
> ConvertJSONToSQL
> ExecuteProcess
> ExecuteStreamCommand
> AttributesToJSON
> EvaluateJsonPath (when writing to content)
> ExtractGrok
> JmsConsumer
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)