Re: NiFi Clustering

Michal Klempa Tue, 10 Jan 2017 01:24:50 -0800

All the files are here: https://gist.github.com/michalklempa


On Tue, Jan 10, 2017 at 9:50 AM, Michal Klempa <[email protected]> wrote:
> Hi,
> we have been doing some tests with NiFi cluster and similar questions arised.
> Our configuration is as follows:
> NiFi ClusterA:
> 172.31.12.232 nifi-cluster-04 (sample configuration
> nifi-cluster-04.properties in attachment)
> 172.31.5.194 nifi-cluster-05
> 172.31.15.84 nifi-cluster-06
> Standalone ZooKeeper, 3 instances, sample configuration
> nifi-cluster-04.zoo.cfg in attachment.
>
> NiFi ClusterB:
> 172.31.9.147 nifi-cluster-01 (sample configuration
> nifi-cluster-01.properties in attachment)
> 172.31.24.77 nifi-cluster-02
> 172.31.8.152 nifi-cluster-03
> Standalone ZooKeeper, 3 instances, sample configuration
> nifi-cluster-01.zoo.cfg in attachment.
>
> We have done testing the following:
> ClusterA_flow (template in attachment):
> GenerateFlowFile -> output_port ("to_clusterB" - the port to be
> imported as RemoteProcessGroup from ClusterB)
>                                  -> PutFile ("/tmp/clusterA", create
> missing dirs: false)
>
> ClusterB_flow (template in attachment):
> RemoteProcessGroup (attached to 172.31.12.232:8081/nifi, remote ports:
> "to_clusterB") > PutFile ("/tmp/clusterB", create missing dirs: false)
>
> Testing scenario is:
> GenerateFlowFile in ClusterA, send the file to output port
> "to_clusterB" and also PutFile ("/tmp/clusterA"). Receive the FlowFile
> from RemoteProcessGtroup in ClusterB, save the file to "/tmp/clusterB"
> on ClusterB machines.
>
> Now following situations were tested:
> Situation1: all the nodes are up and running - three FlowFiles are
> generated in ClusterA, one on each node, all three files are
> transferred to ClusterB, although the distribution is not even on
> ClusterB, When we rerun the GenerateFlowFile (e.g. every 10 sec) 4
> times, we get 12 FlowFiles generated in ClusterA (4 on each node), but
> the have got 6 on node nifi-cluster-01, 2 on node nifi-cluster-02 and
> 4 flow files on node nifi-cluster-03. Although the distribution is not
> even, the flowfiles are properly transferred to clusterB and that is
> important.
> Conclusion: is everything is green, everything works as expected (and
> same as separate nifi instances)
>
> Situation2: We have run GenerateFlowFile 2 times on ClusterA,
> FlowFiles were succesfuly transferred to ClusterB. Then we removed the
> target directory "/tmp/clusterB" on node nifi-cluster-01 node.  We
> have executed GenerateFlowFile two more times. As the PutFile there,
> is configured to NOT created target directiories, we expected errors.
> But the key point is, how can nifi cluster help in resolution.
> Although the Failure relationship from PutFile is directed again as
> the input to PutFile, the result is: 12 FlowFiles generated in
> ClusterA (4 on each node), But after directory removal on node
> nifi-cluster-01, 6 flow files remained stucked on node
> nifi-cluster-01, circling around PutFile with Target directory not
> exists error.
> Conclusion: From this, we can see, although we have cluster setup, the
> nodes do balance somewhere inside RemoteProcessGroup but do not
> rebalance the FlowFiles stucked on relationaships once they enter the
> flow, even after they are penalized by processor. Is this the desired
> behavior? Are there any plans, to improve on this?
>
> Situation3: We have run GenerateFlowFile 2 times on ClusterA,
> FlowFiles were succesfuly transferred to ClusterB. Then we shielded
> node nifi-cluster-01 (ClusterB) using iptables, so that NiFi would be
> unreachable, and ZooKeeper would become unreachable on this node.
> Iptables commands used:
> ```
> iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state
> NEW,ESTABLISHED -j ACCEPT
> iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state
> --state ESTABLISHED -j ACCEPT
> iptables -A INPUT -j DROP
> iptables -A OUTPUT -j DROP
> ```
> This should simulate HW failure from NiFi and ZooKeepers point of view.
> We have executed GenerateFlowFile two more times. The result was: 6
> FlowFiles generated in ClusterA (4 on each node), after shielding
> nifi-cluster-01 node, 6 more flow files were transferred to ClusterB
> (distributed unevenly on nodes nifi-cluster-02 and nifi-cluster-03).
> Conclusion: From this, we can see, that NiFi cluster setup does help
> us in transfer of FlowFiles, if one of the destination nodes becomes
> unavailable. For separate nifi instances, we are currently trying to
> figure out how to arrange the flows to achieve this behavior.Any
> ideas?
>
> Situation4: We have run GenerateFlowFile 2 times on ClusterA,
> FlowFiles were succesfuly transferred to ClusterB. Then we shielded
> node nifi-cluster-04 (ClusterA) using iptables, so that NiFi would be
> unreachable, and ZooKeeper would become unreachable on this node.
> Iptables commands used:
> ```
> iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state
> NEW,ESTABLISHED -j ACCEPT
> iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state
> --state ESTABLISHED -j ACCEPT
> iptables -A INPUT -j DROP
> iptables -A OUTPUT -j DROP
> ```
> This should simulate HW failure from NiFi and ZooKeepers point of view.
>
> The GenerateFlowFile remained executing in timely manner and we were
> unable to stop it. As the UI became unavailable on ClusterA. After
> shielding nifi-cluster-04 node, remaining 2 nodes in ClusterA were
> generating flow files and these were transferred to ClusterB, so the
> flow was running. But it was unmanageable as the UI became
> unavailable.
> Conclusion: From this, we can see, that NiFi cluster setup does help
> us in transfer of FlowFiles, if one of the source nodes becomes
> unavailable. Unfortunately we experienced UI issues. For separate nifi
> instances, we are currently trying to figure out how to arrange the
> flows to achieve this behavior.Any ideas?
>
> * * *
>
> Moreover, we tested upgrade process of the flow.xml.gz. Currently, we
> are using separate NiFi instances managed by Ansible(+Jenkins). The
> job of flow.xml.gz upgrade consists basically of
> 1. service nifi stop
> 2. backup old and place new flow.xml.gz file into conf/ nifi directory
> 3. service nifi start
> As our flows are pre-tested in staging environment, we have never
> experienced issues in production, like nifi wouldn't start cause of
> damaged flow.xml.gz. Everything works ok.
> Even if something would break, we have other separate hot production
> NiFi instances with the old flow.xml.gz running, so the overall flow
> is running through the other nodes (with performance hit of course).
> We can still revert to original flow.xml.gz on a single node we are
> upgrading at once.
>
> Now the question is, if are going to use NiFi cluster feature, how can
> we achieve rolling upgrades of the flow.xml.gz? Should we run a
> separate NiFi Cluster and switch between two clusters?
> We experienced this behavior: NiFi instance does not join the NiFi
> cluster if the flow.xml.gz differs. We had to turn off all NiFi
> instances in a cluster for a while to start a single one with new
> flow.xml.gz to populate the flow pool with the new version. Then, we
> have been forced to deploy new flow.xml.gz to other 2 nodes, as they
> rejected to join cluster :)
>
> * * *
>
> To our use-cases for now, we find using separate nifi instances
> superior to using Nifi cluster. Mainly cause of flow.xml.gz upgrade
> (unless somebody gives us advice on this! thank you).
> Regarding the flow balance and setup of inter-cluster communication,
> we do not know how to achieve this without nifi cluster setup. As for
> now, our flow is very simple and can basically run in parallel in
> multiple single instances, the separate nifi instances work well (even
> our source system supports balancing using more IPs so we do not even
> have to bother in setting up balanced IP on routers).
>
> Any comments are welcome. Thanks.
> Michal Klempa
>
> On Sat, Dec 10, 2016 at 9:03 AM, Caton, Nigel <[email protected]> wrote:
>> Thanks Bryan.
>>
>> On 2016-12-09 15:32 (-0000), Bryan Bende <[email protected]> wrote:
>>> Nigel,>
>>>
>>> The advantage of using a cluster is that whenever you change something in>
>>> the UI, it will be changed on all nodes, and you also get a central view of>
>>> the metrics/stats across all nodes.  If you use standalone nodes you would>
>>> have to go to each node and make the same changes.>
>>>
>>> It sounds like you are probably doing automatic deployments of a flow that>
>>> you setup else where and aren't planning to ever modify the production>
>>> nodes so maybe the above is a non-issue for you.>
>>>
>>> The rolling deployment scenario depends on whether you are updating the>
>>> flow, or just code. For example, if you are just updating code then you>
>>> should be able to do a rolling deployment in a cluster, but if you are>
>>> updating the flow then I don't think it will work because the a node will>
>>> come up with the new flow and attempt to join the cluster, and the cluster>
>>> won't accept it because the flow is different.>
>>>
>>> Hope that helps.>
>>>
>>> -Bryan>
>>>
>>>
>>> On Fri, Dec 9, 2016 at 9:33 AM, Caton, Nigel <[email protected]> wrote:>
>>>
>>> > Are there any views of the pros/cons of running a native NiFi cluster>
>>> > versus a cluster of standalone

Re: NiFi Clustering

Reply via email to