[
https://issues.apache.org/jira/browse/NIFI-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901485#comment-15901485
]
Matthew Clarke commented on NIFI-3559:
--------------------------------------
Koji Kawamura Adding a configurable max batch size will go a long way in
improving S2S load-balancing. M Payne had some additional ideas to make it
even better. Lets say you have a 12 node cluster and have 1000 small FlowFiles
queued to your RPG. As it works now, it is likely that all 1000 FlowFile will
end up being delivered in the 0.5 secs allotted to only one node (no
load-balancing). Let add in the max batch size change and set it to 100 for
example. In the same scenario only 10 out of the 12 nodes would get the data.
Still not completely load-balanced. The source NiFi with RPG knows how many
nodes exist in the target cluster so it should be able to calculate a better
dispersement of FlowFile using the number of nodes available and loading
information is has.
> Improve S2S load-balancing
> --------------------------
>
> Key: NIFI-3559
> URL: https://issues.apache.org/jira/browse/NIFI-3559
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 1.1.1
> Reporter: Matthew Clarke
> Assignee: Koji Kawamura
>
> The current implementation of S2S sends data continuously to the destination
> NiFi node for 0.5 seconds before closing the connection and opening a new
> connection to another node.
> When the source FlowFile are all very small (0 byte in case of list based
> processors), the entire queue can end up getting sent to only one of the
> target NiFi cluster nodes.
> Another common use case for S2S is to have a RPG pointed back at same cluster
> where the RPG was added. Since FlowFiles are likely to transfer to the same
> node where the data originates (Think Primary node data redistribution within
> a cluster) much faster then transfers to other nodes, the primary node is
> likely to always end up with more FlowFiles then any other node.
> There needs to be an additional load-balancing strategy that compliments the
> existing 0.5 second to improve upon the load-balancing in such cases. The
> RPG know how many target nodes there are and how many FlowFiles exist in the
> queue at run time, so perhaps using that info to more even split the queue
> amongst all nodes smartly would help.
> This is related to existing Jira: NiFI-2987
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)