Hi Joe, Thanks a lot for your detailed explanation. That’s clear now :D
Regards, Yang > On 26 Apr 2016, at 01:16, Joe Percivall <[email protected]> > wrote: > > Hello Yang, > > For a cluster with a "Y" style dataflow, each node will have a run a copy of > the whole flow. This means that at the merging point, only data within a > cluster will get merged. > > A little bit of a metaphor: say you want to create toys that combine multiple > different parts together (input data) and you have two workers (nodes). The > way that NiFi would break up the work is to give each worker the blueprints > (the data flow) for the entire toy and each works on the necessary raw > materials independently to create their own end product (end merged > FlowFile). Raw materials from one worker are never merged with the raw > materials of the other, they are worked on independently. > > NiFi uses the same concept of isolating the work to independent workers. > > There is a little wiggle room with re-distributing work to the nodes using > S2S and using the primary node only scheduling strategy but those are special > cases. > > Hope that metaphor helps a bit, > Joe- - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > > On Monday, April 25, 2016 6:29 PM, Yuanzhe Yang (杨远哲) <[email protected]> > wrote: > Hi Joe, > > Thanks a lot for your explanation and suggestion. As for the clustering > question, what I actually want to ask is that, for example, when we have a > two node cluster and a “Y” style dataflow, will the two nodes work on the two > branches respectively? If so, what will happen after the result is merged at > the intersection processor? Does one node become idle? > > Regards, > Yang > > >> On 25 Apr 2016, at 17:44, Joe Percivall <[email protected]> >> wrote: >> >> Hello Yang, >> >> To better understand how data flows through NiFi to the processors you need >> to understand FlowFiles. FlowFiles are the data record that gets processed >> by the processors. FlowFiles are a pointer to content and a collection of >> attributes. So each time the processor acts on the entire FlowFile produced >> by the previous processor. >> >> For clustering, the flow is replicated to each node of the cluster. This >> means each node in the cluster has a copy of the flow which it uses to >> process all data sent to it (except for processor's marked as "primary node" >> only, but that's a bit more advanced). >> >> Also for a better worded, more in-depth look into NiFi I would suggest >> checking out the PR for the "NiFi In Depth" doc[1]. It would help answer >> many questions you may have about the internals of NiFi. Also any comments >> on it are much appreciated. >> >> [1] https://github.com/apache/nifi/pull/339#discussion_r60103526 >> >> Joe >> >> - - - - - - Joseph Percivall >> linkedin.com/in/Percivall >> e: [email protected] >> >> >> >> >> On Monday, April 25, 2016 11:21 AM, Yuanzhe Yang (杨远哲) <[email protected]> >> wrote: >> Hi, >> >> I have read some documentation about NiFi, but I haven’t got a clear >> impression about how data flows inside NiFi. Is it processed streamingly? Or >> does a processor get the entire intermediate result produced by its previous >> processor? Moreover, what is the granularity of clustering? Is it dataflow >> level or processor level? >> >> Thank you very much for your clarification and your work is very much >> appreciated. >> >> Regards, >> Yang
