[ 
https://issues.apache.org/jira/browse/NIFI-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361912#comment-17361912
 ] 

Mark Payne commented on NIFI-8689:
----------------------------------

I ran some performance tests. When sending from my laptop back to itself, with 
the following flow:

GenerateFlowFile (100 bytes) -> RPG (back to self)

Input Port -> UpdateAttribute

The throughput that I got when using 3 threads fro RPG and 3 threads for 
InputPort was 3.3 MM FlowFiles/5 minutes

With 1 thread, I got 3.4 MM FlowFiles/5 mins

I then moved the InputPort -> UpdateAttribute flow to a different NiFi cluster, 
where there was a high latency link (about 65 to 80 ms RTT from ping).

Using 3 threads, I got 805,000 FlowFiles/5 mins

This is summarized here:
||Number of Threads||Destination||Number of FlowFiles Sent in 5 minutes||
|1|Local|3.3 million|
|3|Local|3.4 million|
|3|Remote|805,000|

Then I updated the code to avoid those extraneous buffer flushes and saw the 
following performance:
||Number of Threads||Destination||Number of FlowFiles Sent in 5 
minutes||Percent Improvement||
|1|Local|4.64 million|40%|
|3|Local|5.46 million|65%|
|3|Remote|2.58 million|320%|

The performance was similar using the HTTP-based protocol.

> Site-to-Site client is constantly flushing the socket's OutputStream
> --------------------------------------------------------------------
>
>                 Key: NIFI-8689
>                 URL: https://issues.apache.org/jira/browse/NIFI-8689
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>
> When a RemoteProcessGroup is sending data to another NiFi instance, the 
> protocol should establish a transaction and then send a sequence of FlowFiles 
> following a pattern along the lines of:
> {code:java}
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <FlowFile Follows><FlowFile Attributes><FlowFile Content>
> <Finished Transaction>{code}
> However, currently, the protocol is flushing the Socket's output buffer each 
> that that it indicates that a FlowFile follows, and again after each 
> FlowFile. So it's more like:
> {code:java}
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*
> <FlowFile Follows>*Flush Buffer*
> <FlowFile Attributes><FlowFile Content>*Flush Buffer*{code}
> As a result, when sending a large number of smaller FlowFiles, we end up 
> constantly flushing data to the socket, which results in dramatically worse 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to