Seshu, The Open/Save/Deploy Flow approach is very much what is used pretty much everywhere other than NiFi. This model is exactly the driving force that caused us to create NiFi to begin with - to avoid requiring a developer to maintain these flows and deploy them.
Using the Open/Save/Deploy model, there is a very large disconnect between the person who is the domain expert responsible for understand what the Enterprise needs in terms of dataflow, and the developer actually implementing it. The typical use case is that the dataflow expert will determine that a change is needed. He/she will then formalize the requirement in writing. This is then sent to an engineering manager of some kind who will determine which developer is appropriate for the task and assign the task. The developer will then implement the task as he interprets the requirements. When all testing is complete, it will be deployed. In the very best case, this cycle is long and drawn-out. In the worst case, those requirements were not exact enough or false assumptions were made, and what is deployed is not what the dataflow expert wanted. Or perhaps what was deployed is exactly what the dataflow expert wanted, but there was a slight flaw in the logic. The dataflow expert notices the problem and creates a new requirement, and the cycle starts over. These are very long iteration cycles. As a result, this causes a long delay between realizing an idea and seeing it in production. So NiFi was largely built to address this issue. We want the ability for the dataflow expert to make the change. The dataflow expert should be somewhat technical, as they will need to understand data formats, etc. but not need be a developer by any means. The majority of NiFi operators who create and maintain flows are not developers but rather other subject matter experts. That being said, developers often are able to use NiFi to build some really interesting flows that an ops person wants to deploy. For this reason, we built the concept of a Template. The developer (or another operator) can export parts of their flow (or an entire flow) as a template file and then it can be imported into a different NiFi instance. We have talked about building a registry for such templates, but that doesn't yet exist. It is a key component that we want to work on, though. I don't believe that this in any way prevents enterprise wide deployments of NiFi, as we've used it has been deployed in extremely large enterprise deployments as it was growing up with great success. However, there may well be (and probably are) use cases though that you have that I've not considered. I would very much love to chat more about this with you (as well as any other ideas or concerns that you may have with NiFi) going forward. Thanks! -Mark ---------------------------------------- > From: [email protected] > To: [email protected] > Subject: Re: Thoughts on the Blog Article [Apache NiFi: Thinking Differently > About DataFlow] > Date: Thu, 28 May 2015 12:58:18 +0000 > > Mark, > > Thanks for the response. Is Process Groups the only abstraction for > maintaining disparate flows, Did you consider the more traditional Open > Flow/Save Flow approach? > > If I start thinking of NiFi as a replacement to enterprise ETL tools like > Informatica/AbInito in the Hadoop world, I would introduce different > personas ³Administrator": Manages and monitors the flows, ³ETL Developer²: > develops and deploys the flows etc and build an authorization model around > it. > > It would definitely complicate the model, but would allow for an > enterprise wide deployment of NiFi. Would love to discuss more. > > Regards > Seshu Adunuthula > > > On 5/27/15, 2:18 PM, "Mark Payne" <[email protected]> wrote: > >>Seshu, >> >>Thanks for the e-mail and for sharing your concerns! >> >>So when we talk about combining multiple sources into a single flow, we >>don't mean that all data should be combined into a single flow. It >>absolutely makes sense to sometimes have very disparate flows! In some of >>the instances we've run, we have dozens or more disparate flows. The idea >>that I wanted to convey in the article is that just because 2 pieces of >>data come from different sources does not mean that they should be >>different flows. But if the data needs to be handled very differently >>then it absolutely should be two different flows. Those flows then can >>live side-by-side within the same instance of NiFi (generally in >>different Process Groups so that the graph is maintainable). >> >>The idea of how to handle security and authorization is definitely an >>ongoing debate. There are really two major approaches here. The first >>approach, which we offer today, is to have a separate instance of NiFi >>when different security and authorization is required. Remote Process >>Groups/site-to-site functionality is then used to send the data between >>flows. The rub here is that if you have many instances it can be >>different to manage them. >> >>The other approach would be to allow the security and authorization to >>take place at the Process Group level, rather than the Flow Controller >>level. This would be a very significant amount of work and may make the >>application more difficult to use, if the administrators then had to >>manage each group independently. So there are definitely trade-offs to >>each approach. If you have ideas about how you'd like to see it work, >>please share them so that we can make NiFi as useful as possible. >> >>Thanks >>-Mark >> >>---------------------------------------- >>> From: [email protected] >>> To: [email protected] >>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking >>>Differently About DataFlow] >>> Date: Tue, 26 May 2015 20:48:16 +0000 >>> >>> Hello Folks, >>> >>> Finally got to install NiFi and got the sample flows running and read >>>the Blog article at >>>https://blogs.apache.org/nifi/entry/basic_dataflow_design. >>> >>>> The question was "Is it possible to have NiFi service setup and >>>>running and allow for multiple dataflows to be designed and deployed >>>>(running) at the same time?² >>> >>> I understand the argument being made by the author on how you can use >>>Nifi to have a single flow with several inputs compared to several >>>disparate flows. But there are multiple advantages to having Nifi manage >>>several disparate flows. >>> >>> * Managing Flows that have very different transformations >>> * Security: Authorization, who has access to what flows, executing >>>flows as a named user instead of a super user. >>> * Resource Management: Scheduling the resources across disparate flows >>> * Etc >>> >>> Are there future plans to have Nifi Service setup and manage multiple >>>data flows? >>> >>> Regards >>> Seshu Adunuthula >>> >>> >>> >> >
