Hi Joe, Thanks a lot for your explanation and suggestion. As for the clustering question, what I actually want to ask is that, for example, when we have a two node cluster and a “Y” style dataflow, will the two nodes work on the two branches respectively? If so, what will happen after the result is merged at the intersection processor? Does one node become idle?
Regards, Yang > On 25 Apr 2016, at 17:44, Joe Percivall <[email protected]> > wrote: > > Hello Yang, > > To better understand how data flows through NiFi to the processors you need > to understand FlowFiles. FlowFiles are the data record that gets processed by > the processors. FlowFiles are a pointer to content and a collection of > attributes. So each time the processor acts on the entire FlowFile produced > by the previous processor. > > For clustering, the flow is replicated to each node of the cluster. This > means each node in the cluster has a copy of the flow which it uses to > process all data sent to it (except for processor's marked as "primary node" > only, but that's a bit more advanced). > > Also for a better worded, more in-depth look into NiFi I would suggest > checking out the PR for the "NiFi In Depth" doc[1]. It would help answer many > questions you may have about the internals of NiFi. Also any comments on it > are much appreciated. > > [1] https://github.com/apache/nifi/pull/339#discussion_r60103526 > > Joe > > - - - - - - Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > > On Monday, April 25, 2016 11:21 AM, Yuanzhe Yang (杨远哲) <[email protected]> > wrote: > Hi, > > I have read some documentation about NiFi, but I haven’t got a clear > impression about how data flows inside NiFi. Is it processed streamingly? Or > does a processor get the entire intermediate result produced by its previous > processor? Moreover, what is the granularity of clustering? Is it dataflow > level or processor level? > > Thank you very much for your clarification and your work is very much > appreciated. > > Regards, > Yang
