[
https://issues.apache.org/jira/browse/NIFI-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031835#comment-17031835
]
Tamas Palfy commented on NIFI-7081:
-----------------------------------
[~markap14], [~joewitt]
I have tested a setup that is similar to my previous idea. The main differences
are:
# New balancing strategy instead changing round robin (Doesn't really affect
functionality)
# When checking if a connection is full, it takes into consideration what the
balancing strategy is
The result is working. Everything is the same as before with the existing
balancing strategies and the new one balances among the available nodes.
Backpressure is also applied... except the thresholds are probably not where we
would want them with the new strategy.
Given
N=number or nodes
Q="Back Pressure Object Threshold" set on the connection
Consumer processor is not running
If the producer processor runs on all nodes, backpressure kicks in at N*N*Q.
If the producer processor runs on primary only, backpressure kicks in at
(2N-1)*Q.
I guess it makes sense as in the first case all N nodes have a Q-sized buffer
for all N nodes - themselves (local partition) and the sibling nodes (remote
partitions).
In the second case if I understand correctly the primary node can actually send
over Q number of flowfiles to the sibling nodes (which will be stored in their
local partition I presume) - that's the (N-1)*Q - and also has it's own N*Q
local buffers (the 1 local- and N-1 remote partitions).
(These are not just theoretical values btw, I did some measurements.)
Not sure if those increased thresholds could work for us.
As for me, I think running a processor on all nodes with a load-balanced
connection hardly makes sense (why not handle each node their own loads like
normal) and (2N-1)Q instead of N*Q in case of a primary only processor doesn't
sound that terrible with only a constant 2 factor increase.
> Improve handling of Load Balanced Connections when one node is slow
> -------------------------------------------------------------------
>
> Key: NIFI-7081
> URL: https://issues.apache.org/jira/browse/NIFI-7081
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Tamas Palfy
> Priority: Major
>
> When a connection is configured to use Round Robin load balancing, the
> FlowFIle Queue works by queuing up one FlowFile to be processed locally, one
> to be sent to Node 2, one to be sent to Node 3, the next one to be locally
> processed, etc. (in this case, assuming a 3-node cluster).
> If one node in a cluster is slow, though, we can have a situation where the
> local partition is empty and the partition for Node 2 is empty. But Node 3's
> partition is full, because Node 3 is not processing the data quickly enough.
> As a result, on Node 1, the queue ends up applying backpressure, with all
> FlowFiles in the queue waiting to be pushed to Node 3.
> In such a situation, we end up preventing any data from being processed by
> Node 1 or Node 2. It would be advantageous to improve this so that Node 1 and
> Node 2 could still be busy processing data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)