[ 
https://issues.apache.org/jira/browse/NIFI-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478887#comment-17478887
 ] 

Denis Jakupovic commented on NIFI-9598:
---------------------------------------

definitely agree. Labeling would be a game changer for NiFi development, also 
in AWS with different EC2 instances. With labeling the primary e.g could only 
be used to handle critical processes and afterwards partitioning the relevant 
data to other nodes by label.

It's always a hard time explaining to customers why they should provision 
another cluster to partition the dataflows instead of just adding more nodes...

> Load Balancing on labeled nodes and/or fixed amount of usable nodes in 
> process groups
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-9598
>                 URL: https://issues.apache.org/jira/browse/NIFI-9598
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 1.15.3
>            Reporter: Denis Jakupovic
>            Priority: Trivial
>
> One of NiFi's great features is its linear scalability by adding just more 
> nodes. However by only having the distribute load processor or by round 
> robin, load balance by attribute name or to a single node feature in the 
> connection, we could need a more granular form of distributing flowfiles 
> through the cluster. 
> Let's assume we have a 10 node NiFi Cluster. 
> Round Robin: Each node would get 1/10 of the flowfiles.
> Single Node: Only one node would process all FF. Chance that other process 
> groups distribute to same node is 1/10
> By Attribute: 1-10 nodes could get the data, not evenly partitioned
> Distribute Load Processor: Manual and fixed process, cannot scale with adding 
> more nodes to the cluster and needs 
> By having several dataflows with different use cases with enormous variance 
> in computation, one or a few dataflows can slow down all other data flows. 
> Therefore a solution could be partitioning the data to labeled nodes or by 
> setting the maximum allowed nodes to use for FF partitioning/load balancing 
> on process groups or a connection.
> In the cluster configuration each node could be labeled. Distributing the FF 
> by round robin would only be distributed to the labeled nodes with the proper 
> label. A distribution by attribute name would mean to build the attribute 
> accordingly and cannot be build dynamically. 
> Another great feature would be the maximum amount of nodes a process group 
> can use to distribute nodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to