Hello Yang, To better understand how data flows through NiFi to the processors you need to understand FlowFiles. FlowFiles are the data record that gets processed by the processors. FlowFiles are a pointer to content and a collection of attributes. So each time the processor acts on the entire FlowFile produced by the previous processor.
For clustering, the flow is replicated to each node of the cluster. This means each node in the cluster has a copy of the flow which it uses to process all data sent to it (except for processor's marked as "primary node" only, but that's a bit more advanced). Also for a better worded, more in-depth look into NiFi I would suggest checking out the PR for the "NiFi In Depth" doc[1]. It would help answer many questions you may have about the internals of NiFi. Also any comments on it are much appreciated. [1] https://github.com/apache/nifi/pull/339#discussion_r60103526 Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: [email protected] On Monday, April 25, 2016 11:21 AM, Yuanzhe Yang (杨远哲) <[email protected]> wrote: Hi, I have read some documentation about NiFi, but I haven’t got a clear impression about how data flows inside NiFi. Is it processed streamingly? Or does a processor get the entire intermediate result produced by its previous processor? Moreover, what is the granularity of clustering? Is it dataflow level or processor level? Thank you very much for your clarification and your work is very much appreciated. Regards, Yang
