Hi - I have a question regarding load distribution in a clustered NiFi environment. I have a really simple example, I'm using the GenerateFlowFile processor to generate some random data, then I MD5 hash the file and print out the resulting hash.
I want only the primary node to generate the data, but I want both nodes in the cluster to share the hashing workload. It appears if I set the scheduling strategy to "On primary node" for the GenerateFlowFile processor, then the next processor (HashContent) is only being accepted and processed by a single node. I've put DistributeLoad processor in-between the HashContent and GenerateFlowFile, but this requires me to use the remote process group to distribute the load, which doesn't seem intuitive when I'm already clustered. I guess my question is, is it possible for the DistributeLoad processor to understand that NiFi is in a clustered environment, and have an ability to distribute the next processor (HashContent) amongst all nodes in the cluster? Cheers, -- Ricky Saltzer http://www.cloudera.com
