I'd like to start a discussion on making NiFi Clustering more robust. As it is, a Cluster is very beneficial from a management standpoint. A change only needs to be made in one place - any place - and it is replicated across the cluster. However, from a data perspective, the same flexibility is not available. Data is node-specific and a flowfile will only ever exist on a single node. Start thinking single point of failure. It would be desirable if data could be shuffled over to the same point in the flow on another (less busy) node to complete processing.
There are many factors to consider such as the time it would take to transfer flowfiles to another node versus waiting for a transient spike in load to return to normal levels. However, consider the other extreme, where a node crashes or for some reason is removed from the cluster (but the file system is still available). The data could be recovered from the problematic node and processed by surviving nodes. This also allows for flexibility to dynamically spin up _and down_ additional NiFi instances on-demand while not having to wait to bleed off data when removing a node from the cluster. Fundamentally, this comes down to some form of shared file system across the Cluster. Admittedly, there are very complex issues involved not the least of which is proper synchronization. Yet the concept could provide huge benefits especially in terms of load balancing across the Cluster. -Mark
