[jira] [Commented] (NIFI-7081) Improve handling of Load Balanced Connections when one node is slow

Tamas Palfy (Jira) Thu, 06 Feb 2020 10:10:26 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031835#comment-17031835
 ]


Tamas Palfy commented on NIFI-7081:
-----------------------------------

[~markap14], [~joewitt]

I have tested a setup that is similar to my previous idea. The main differences 
are:
 # New balancing strategy instead changing round robin (Doesn't really affect 
functionality)
 # When checking if a connection is full, it takes into consideration what the 
balancing strategy is

The result is working. Everything is the same as before with the existing 
balancing strategies and the new one balances among the available nodes. 
Backpressure is also applied... except the thresholds are probably not where we 
would want them with the new strategy.

Given
N=number or nodes
Q="Back Pressure Object Threshold" set on the connection
Consumer processor is not running

If the producer processor runs on all nodes, backpressure kicks in at N*N*Q.
If the producer processor runs on primary only, backpressure kicks in at 
(2N-1)*Q.

I guess it makes sense as in the first case all N nodes have a Q-sized buffer 
for all N nodes - themselves (local partition) and the sibling nodes (remote 
partitions).
In the second case if I understand correctly the primary node can actually send 
over Q number of flowfiles to the sibling nodes (which will be stored in their 
local partition I presume) - that's the (N-1)*Q - and also has it's own N*Q 
local buffers (the 1 local- and N-1 remote partitions).

(These are not just theoretical values btw, I did some measurements.)

Not sure if those increased thresholds could work for us.
As for me, I think running a processor on all nodes with a load-balanced 
connection hardly makes sense (why not handle each node their own loads like 
normal) and (2N-1)Q instead of N*Q in case of a primary only processor doesn't 
sound that terrible with only a constant 2 factor increase.

> Improve handling of Load Balanced Connections when one node is slow
> -------------------------------------------------------------------
>
>                 Key: NIFI-7081
>                 URL: https://issues.apache.org/jira/browse/NIFI-7081
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Tamas Palfy
>            Priority: Major
>
> When a connection is configured to use Round Robin load balancing, the 
> FlowFIle Queue works by queuing up one FlowFile to be processed locally, one 
> to be sent to Node 2, one to be sent to Node 3, the next one to be locally 
> processed, etc. (in this case, assuming a 3-node cluster).
> If one node in a cluster is slow, though, we can have a situation where the 
> local partition is empty and the partition for Node 2 is empty. But Node 3's 
> partition is full, because Node 3 is not processing the data quickly enough. 
> As a result, on Node 1, the queue ends up applying backpressure, with all 
> FlowFiles in the queue waiting to be pushed to Node 3.
> In such a situation, we end up preventing any data from being processed by 
> Node 1 or Node 2. It would be advantageous to improve this so that Node 1 and 
> Node 2 could still be busy processing data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NIFI-7081) Improve handling of Load Balanced Connections when one node is slow

Reply via email to