[
https://issues.apache.org/jira/browse/NIFI-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361912#comment-17361912
]
Mark Payne commented on NIFI-8689:
----------------------------------
I ran some performance tests. When sending from my laptop back to itself, with
the following flow:
GenerateFlowFile (100 bytes) -> RPG (back to self)
Input Port -> UpdateAttribute
The throughput that I got when using 3 threads fro RPG and 3 threads for
InputPort was 3.3 MM FlowFiles/5 minutes
With 1 thread, I got 3.4 MM FlowFiles/5 mins
I then moved the InputPort -> UpdateAttribute flow to a different NiFi cluster,
where there was a high latency link (about 65 to 80 ms RTT from ping).
Using 3 threads, I got 805,000 FlowFiles/5 mins
This is summarized here:
||Number of Threads||Destination||Number of FlowFiles Sent in 5 minutes||
|1|Local|3.3 million|
|3|Local|3.4 million|
|3|Remote|805,000|
Then I updated the code to avoid those extraneous buffer flushes and saw the
following performance:
||Number of Threads||Destination||Number of FlowFiles Sent in 5
minutes||Percent Improvement||
|1|Local|4.64 million|40%|
|3|Local|5.46 million|65%|
|3|Remote|2.58 million|320%|
The performance was similar using the HTTP-based protocol.
> Site-to-Site client is constantly flushing the socket's OutputStream
> --------------------------------------------------------------------
>
> Key: NIFI-8689
> URL: https://issues.apache.org/jira/browse/NIFI-8689
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> When a RemoteProcessGroup is sending data to another NiFi instance, the
> protocol should establish a transaction and then send a sequence of FlowFiles
> following a pattern along the lines of:
> {code:java}
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <Finished Transaction>{code}
> However, currently, the protocol is flushing the Socket's output buffer each
> that that it indicates that a FlowFile follows, and again after each
> FlowFile. So it's more like:
> {code:java}
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*{code}
> As a result, when sending a large number of smaller FlowFiles, we end up
> constantly flushing data to the socket, which results in dramatically worse
> performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)