Seshu, NiFi has been used extensively as an enterprise (global) wide dataflow tool. It supports large teams of people with differing levels of authorization and access roles operating on the same cluster supporting vast numbers of different dataflows through the same system. Though it has some considerable utility in a classic ETL sense it wasn't built for classic ETL cases necessarily. It was built for the sensor/source to processing to database/warehouse/etc.. problem on a really massive scale. In many ways it replaces traditional ETL approaches and in others it compliments them. We weren't really setting out to replace some particular system and specifically weren't inspired by the systems mentioned. But rather we set out to fill a gap that we saw. Specifically that is effective 'dataflow' is not just 'data transport'.
Regarding the open flow/save flow approach we definitely considered that. We often refer to that as the 'design and deploy model'. In many ways that is why we built nifi in the first place. There is definitely value in that model. But there is also a large dragging force it imposes which is it creates a significant disconnect from making a change and seeing its effect. That often means slow integration activities and when errors occur it isn't easy to find root cause. That model provides a sense of comfort as it is common and well known and fits a typical software development model. But it doesn't necessarily reflect the operational needs that can occur which require prompt, reliable, verifiable changes to benefit the business. So the model NiFi supports is that of immediate/real-time changes. We can then create templates of those flows, store them in a registry, and folks could share them. There are additional things we can do to support the classic design and deploy model for the cases where it is truly essential. And we're also working with folks to explain the value in moving away from that model when they can. There is no single answer for sure but we needed a model that can support both sides of that story and that is what we have. We've started from this base of realtime command and control and are adding support for the classic model. But the classic model alone cannot support realtime. Let's keep the discussion going. This is good stuff. We know we can and should do more to support the classic view when critical but we want to really understand the 'why' behind it. In some cases folks like it because they know that and in others it is truly critical. We want to understand those truly critical cases. Thanks Joe On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <[email protected]> wrote: > Mark, > > Thanks for the response. Is Process Groups the only abstraction for > maintaining disparate flows, Did you consider the more traditional Open > Flow/Save Flow approach? > > If I start thinking of NiFi as a replacement to enterprise ETL tools like > Informatica/AbInito in the Hadoop world, I would introduce different > personas ³Administrator": Manages and monitors the flows, ³ETL Developer²: > develops and deploys the flows etc and build an authorization model around > it. > > It would definitely complicate the model, but would allow for an > enterprise wide deployment of NiFi. Would love to discuss more. > > Regards > Seshu Adunuthula > > > On 5/27/15, 2:18 PM, "Mark Payne" <[email protected]> wrote: > >>Seshu, >> >>Thanks for the e-mail and for sharing your concerns! >> >>So when we talk about combining multiple sources into a single flow, we >>don't mean that all data should be combined into a single flow. It >>absolutely makes sense to sometimes have very disparate flows! In some of >>the instances we've run, we have dozens or more disparate flows. The idea >>that I wanted to convey in the article is that just because 2 pieces of >>data come from different sources does not mean that they should be >>different flows. But if the data needs to be handled very differently >>then it absolutely should be two different flows. Those flows then can >>live side-by-side within the same instance of NiFi (generally in >>different Process Groups so that the graph is maintainable). >> >>The idea of how to handle security and authorization is definitely an >>ongoing debate. There are really two major approaches here. The first >>approach, which we offer today, is to have a separate instance of NiFi >>when different security and authorization is required. Remote Process >>Groups/site-to-site functionality is then used to send the data between >>flows. The rub here is that if you have many instances it can be >>different to manage them. >> >>The other approach would be to allow the security and authorization to >>take place at the Process Group level, rather than the Flow Controller >>level. This would be a very significant amount of work and may make the >>application more difficult to use, if the administrators then had to >>manage each group independently. So there are definitely trade-offs to >>each approach. If you have ideas about how you'd like to see it work, >>please share them so that we can make NiFi as useful as possible. >> >>Thanks >>-Mark >> >>---------------------------------------- >>> From: [email protected] >>> To: [email protected] >>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking >>>Differently About DataFlow] >>> Date: Tue, 26 May 2015 20:48:16 +0000 >>> >>> Hello Folks, >>> >>> Finally got to install NiFi and got the sample flows running and read >>>the Blog article at >>>https://blogs.apache.org/nifi/entry/basic_dataflow_design. >>> >>>> The question was "Is it possible to have NiFi service setup and >>>>running and allow for multiple dataflows to be designed and deployed >>>>(running) at the same time?² >>> >>> I understand the argument being made by the author on how you can use >>>Nifi to have a single flow with several inputs compared to several >>>disparate flows. But there are multiple advantages to having Nifi manage >>>several disparate flows. >>> >>> * Managing Flows that have very different transformations >>> * Security: Authorization, who has access to what flows, executing >>>flows as a named user instead of a super user. >>> * Resource Management: Scheduling the resources across disparate flows >>> * Etc >>> >>> Are there future plans to have Nifi Service setup and manage multiple >>>data flows? >>> >>> Regards >>> Seshu Adunuthula >>> >>> >>> >> >
