Joe, Thanks for the detailed response. Let me spend some time understanding the model of ProcessGroups and templates. I guess it is a switch from the classic model, so would take time to get used to…
Regards Seshu On 5/28/15, 6:45 AM, "Joe Witt" <[email protected]> wrote: >Seshu, > >NiFi has been used extensively as an enterprise (global) wide dataflow >tool. It supports large teams of people with differing levels of >authorization and access roles operating on the same cluster >supporting vast numbers of different dataflows through the same >system. Though it has some considerable utility in a classic ETL >sense it wasn't built for classic ETL cases necessarily. It was built >for the sensor/source to processing to database/warehouse/etc.. >problem on a really massive scale. In many ways it replaces >traditional ETL approaches and in others it compliments them. We >weren't really setting out to replace some particular system and >specifically weren't inspired by the systems mentioned. But rather we >set out to fill a gap that we saw. Specifically that is effective >'dataflow' is not just 'data transport'. > >Regarding the open flow/save flow approach we definitely considered >that. We often refer to that as the 'design and deploy model'. In >many ways that is why we built nifi in the first place. There is >definitely value in that model. But there is also a large dragging >force it imposes which is it creates a significant disconnect from >making a change and seeing its effect. That often means slow >integration activities and when errors occur it isn't easy to find >root cause. That model provides a sense of comfort as it is common >and well known and fits a typical software development model. But it >doesn't necessarily reflect the operational needs that can occur which >require prompt, reliable, verifiable changes to benefit the business. > >So the model NiFi supports is that of immediate/real-time changes. We >can then create templates of those flows, store them in a registry, >and folks could share them. There are additional things we can do to >support the classic design and deploy model for the cases where it is >truly essential. And we're also working with folks to explain the >value in moving away from that model when they can. There is no >single answer for sure but we needed a model that can support both >sides of that story and that is what we have. We've started from this >base of realtime command and control and are adding support for the >classic model. But the classic model alone cannot support realtime. > >Let's keep the discussion going. This is good stuff. We know we can >and should do more to support the classic view when critical but we >want to really understand the 'why' behind it. In some cases folks >like it because they know that and in others it is truly critical. We >want to understand those truly critical cases. > >Thanks >Joe > > > >On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <[email protected]> >wrote: >> Mark, >> >> Thanks for the response. Is Process Groups the only abstraction for >> maintaining disparate flows, Did you consider the more traditional Open >> Flow/Save Flow approach? >> >> If I start thinking of NiFi as a replacement to enterprise ETL tools >>like >> Informatica/AbInito in the Hadoop world, I would introduce different >> personas ³Administrator": Manages and monitors the flows, ³ETL >>Developer²: >> develops and deploys the flows etc and build an authorization model >>around >> it. >> >> It would definitely complicate the model, but would allow for an >> enterprise wide deployment of NiFi. Would love to discuss more. >> >> Regards >> Seshu Adunuthula >> >> >> On 5/27/15, 2:18 PM, "Mark Payne" <[email protected]> wrote: >> >>>Seshu, >>> >>>Thanks for the e-mail and for sharing your concerns! >>> >>>So when we talk about combining multiple sources into a single flow, we >>>don't mean that all data should be combined into a single flow. It >>>absolutely makes sense to sometimes have very disparate flows! In some >>>of >>>the instances we've run, we have dozens or more disparate flows. The >>>idea >>>that I wanted to convey in the article is that just because 2 pieces of >>>data come from different sources does not mean that they should be >>>different flows. But if the data needs to be handled very differently >>>then it absolutely should be two different flows. Those flows then can >>>live side-by-side within the same instance of NiFi (generally in >>>different Process Groups so that the graph is maintainable). >>> >>>The idea of how to handle security and authorization is definitely an >>>ongoing debate. There are really two major approaches here. The first >>>approach, which we offer today, is to have a separate instance of NiFi >>>when different security and authorization is required. Remote Process >>>Groups/site-to-site functionality is then used to send the data between >>>flows. The rub here is that if you have many instances it can be >>>different to manage them. >>> >>>The other approach would be to allow the security and authorization to >>>take place at the Process Group level, rather than the Flow Controller >>>level. This would be a very significant amount of work and may make the >>>application more difficult to use, if the administrators then had to >>>manage each group independently. So there are definitely trade-offs to >>>each approach. If you have ideas about how you'd like to see it work, >>>please share them so that we can make NiFi as useful as possible. >>> >>>Thanks >>>-Mark >>> >>>---------------------------------------- >>>> From: [email protected] >>>> To: [email protected] >>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking >>>>Differently About DataFlow] >>>> Date: Tue, 26 May 2015 20:48:16 +0000 >>>> >>>> Hello Folks, >>>> >>>> Finally got to install NiFi and got the sample flows running and read >>>>the Blog article at >>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design. >>>> >>>>> The question was "Is it possible to have NiFi service setup and >>>>>running and allow for multiple dataflows to be designed and deployed >>>>>(running) at the same time?² >>>> >>>> I understand the argument being made by the author on how you can use >>>>Nifi to have a single flow with several inputs compared to several >>>>disparate flows. But there are multiple advantages to having Nifi >>>>manage >>>>several disparate flows. >>>> >>>> * Managing Flows that have very different transformations >>>> * Security: Authorization, who has access to what flows, executing >>>>flows as a named user instead of a super user. >>>> * Resource Management: Scheduling the resources across disparate flows >>>> * Etc >>>> >>>> Are there future plans to have Nifi Service setup and manage multiple >>>>data flows? >>>> >>>> Regards >>>> Seshu Adunuthula >>>> >>>> >>>> >>> >>
