All the files are here: https://gist.github.com/michalklempa
On Tue, Jan 10, 2017 at 9:50 AM, Michal Klempa <[email protected]> wrote: > Hi, > we have been doing some tests with NiFi cluster and similar questions arised. > Our configuration is as follows: > NiFi ClusterA: > 172.31.12.232 nifi-cluster-04 (sample configuration > nifi-cluster-04.properties in attachment) > 172.31.5.194 nifi-cluster-05 > 172.31.15.84 nifi-cluster-06 > Standalone ZooKeeper, 3 instances, sample configuration > nifi-cluster-04.zoo.cfg in attachment. > > NiFi ClusterB: > 172.31.9.147 nifi-cluster-01 (sample configuration > nifi-cluster-01.properties in attachment) > 172.31.24.77 nifi-cluster-02 > 172.31.8.152 nifi-cluster-03 > Standalone ZooKeeper, 3 instances, sample configuration > nifi-cluster-01.zoo.cfg in attachment. > > We have done testing the following: > ClusterA_flow (template in attachment): > GenerateFlowFile -> output_port ("to_clusterB" - the port to be > imported as RemoteProcessGroup from ClusterB) > -> PutFile ("/tmp/clusterA", create > missing dirs: false) > > ClusterB_flow (template in attachment): > RemoteProcessGroup (attached to 172.31.12.232:8081/nifi, remote ports: > "to_clusterB") > PutFile ("/tmp/clusterB", create missing dirs: false) > > Testing scenario is: > GenerateFlowFile in ClusterA, send the file to output port > "to_clusterB" and also PutFile ("/tmp/clusterA"). Receive the FlowFile > from RemoteProcessGtroup in ClusterB, save the file to "/tmp/clusterB" > on ClusterB machines. > > Now following situations were tested: > Situation1: all the nodes are up and running - three FlowFiles are > generated in ClusterA, one on each node, all three files are > transferred to ClusterB, although the distribution is not even on > ClusterB, When we rerun the GenerateFlowFile (e.g. every 10 sec) 4 > times, we get 12 FlowFiles generated in ClusterA (4 on each node), but > the have got 6 on node nifi-cluster-01, 2 on node nifi-cluster-02 and > 4 flow files on node nifi-cluster-03. Although the distribution is not > even, the flowfiles are properly transferred to clusterB and that is > important. > Conclusion: is everything is green, everything works as expected (and > same as separate nifi instances) > > Situation2: We have run GenerateFlowFile 2 times on ClusterA, > FlowFiles were succesfuly transferred to ClusterB. Then we removed the > target directory "/tmp/clusterB" on node nifi-cluster-01 node. We > have executed GenerateFlowFile two more times. As the PutFile there, > is configured to NOT created target directiories, we expected errors. > But the key point is, how can nifi cluster help in resolution. > Although the Failure relationship from PutFile is directed again as > the input to PutFile, the result is: 12 FlowFiles generated in > ClusterA (4 on each node), But after directory removal on node > nifi-cluster-01, 6 flow files remained stucked on node > nifi-cluster-01, circling around PutFile with Target directory not > exists error. > Conclusion: From this, we can see, although we have cluster setup, the > nodes do balance somewhere inside RemoteProcessGroup but do not > rebalance the FlowFiles stucked on relationaships once they enter the > flow, even after they are penalized by processor. Is this the desired > behavior? Are there any plans, to improve on this? > > Situation3: We have run GenerateFlowFile 2 times on ClusterA, > FlowFiles were succesfuly transferred to ClusterB. Then we shielded > node nifi-cluster-01 (ClusterB) using iptables, so that NiFi would be > unreachable, and ZooKeeper would become unreachable on this node. > Iptables commands used: > ``` > iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state > NEW,ESTABLISHED -j ACCEPT > iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state > --state ESTABLISHED -j ACCEPT > iptables -A INPUT -j DROP > iptables -A OUTPUT -j DROP > ``` > This should simulate HW failure from NiFi and ZooKeepers point of view. > We have executed GenerateFlowFile two more times. The result was: 6 > FlowFiles generated in ClusterA (4 on each node), after shielding > nifi-cluster-01 node, 6 more flow files were transferred to ClusterB > (distributed unevenly on nodes nifi-cluster-02 and nifi-cluster-03). > Conclusion: From this, we can see, that NiFi cluster setup does help > us in transfer of FlowFiles, if one of the destination nodes becomes > unavailable. For separate nifi instances, we are currently trying to > figure out how to arrange the flows to achieve this behavior.Any > ideas? > > Situation4: We have run GenerateFlowFile 2 times on ClusterA, > FlowFiles were succesfuly transferred to ClusterB. Then we shielded > node nifi-cluster-04 (ClusterA) using iptables, so that NiFi would be > unreachable, and ZooKeeper would become unreachable on this node. > Iptables commands used: > ``` > iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state > NEW,ESTABLISHED -j ACCEPT > iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state > --state ESTABLISHED -j ACCEPT > iptables -A INPUT -j DROP > iptables -A OUTPUT -j DROP > ``` > This should simulate HW failure from NiFi and ZooKeepers point of view. > > The GenerateFlowFile remained executing in timely manner and we were > unable to stop it. As the UI became unavailable on ClusterA. After > shielding nifi-cluster-04 node, remaining 2 nodes in ClusterA were > generating flow files and these were transferred to ClusterB, so the > flow was running. But it was unmanageable as the UI became > unavailable. > Conclusion: From this, we can see, that NiFi cluster setup does help > us in transfer of FlowFiles, if one of the source nodes becomes > unavailable. Unfortunately we experienced UI issues. For separate nifi > instances, we are currently trying to figure out how to arrange the > flows to achieve this behavior.Any ideas? > > * * * > > Moreover, we tested upgrade process of the flow.xml.gz. Currently, we > are using separate NiFi instances managed by Ansible(+Jenkins). The > job of flow.xml.gz upgrade consists basically of > 1. service nifi stop > 2. backup old and place new flow.xml.gz file into conf/ nifi directory > 3. service nifi start > As our flows are pre-tested in staging environment, we have never > experienced issues in production, like nifi wouldn't start cause of > damaged flow.xml.gz. Everything works ok. > Even if something would break, we have other separate hot production > NiFi instances with the old flow.xml.gz running, so the overall flow > is running through the other nodes (with performance hit of course). > We can still revert to original flow.xml.gz on a single node we are > upgrading at once. > > Now the question is, if are going to use NiFi cluster feature, how can > we achieve rolling upgrades of the flow.xml.gz? Should we run a > separate NiFi Cluster and switch between two clusters? > We experienced this behavior: NiFi instance does not join the NiFi > cluster if the flow.xml.gz differs. We had to turn off all NiFi > instances in a cluster for a while to start a single one with new > flow.xml.gz to populate the flow pool with the new version. Then, we > have been forced to deploy new flow.xml.gz to other 2 nodes, as they > rejected to join cluster :) > > * * * > > To our use-cases for now, we find using separate nifi instances > superior to using Nifi cluster. Mainly cause of flow.xml.gz upgrade > (unless somebody gives us advice on this! thank you). > Regarding the flow balance and setup of inter-cluster communication, > we do not know how to achieve this without nifi cluster setup. As for > now, our flow is very simple and can basically run in parallel in > multiple single instances, the separate nifi instances work well (even > our source system supports balancing using more IPs so we do not even > have to bother in setting up balanced IP on routers). > > Any comments are welcome. Thanks. > Michal Klempa > > On Sat, Dec 10, 2016 at 9:03 AM, Caton, Nigel <[email protected]> wrote: >> Thanks Bryan. >> >> On 2016-12-09 15:32 (-0000), Bryan Bende <[email protected]> wrote: >>> Nigel,> >>> >>> The advantage of using a cluster is that whenever you change something in> >>> the UI, it will be changed on all nodes, and you also get a central view of> >>> the metrics/stats across all nodes. If you use standalone nodes you would> >>> have to go to each node and make the same changes.> >>> >>> It sounds like you are probably doing automatic deployments of a flow that> >>> you setup else where and aren't planning to ever modify the production> >>> nodes so maybe the above is a non-issue for you.> >>> >>> The rolling deployment scenario depends on whether you are updating the> >>> flow, or just code. For example, if you are just updating code then you> >>> should be able to do a rolling deployment in a cluster, but if you are> >>> updating the flow then I don't think it will work because the a node will> >>> come up with the new flow and attempt to join the cluster, and the cluster> >>> won't accept it because the flow is different.> >>> >>> Hope that helps.> >>> >>> -Bryan> >>> >>> >>> On Fri, Dec 9, 2016 at 9:33 AM, Caton, Nigel <[email protected]> wrote:> >>> >>> > Are there any views of the pros/cons of running a native NiFi cluster> >>> > versus a cluster of standalone
