Joe, 

Thanks for the detailed response. Let me spend some time understanding the
model of ProcessGroups and templates. I guess it is a switch from the
classic model, so would take time to get used to…


Regards
Seshu

On 5/28/15, 6:45 AM, "Joe Witt" <[email protected]> wrote:

>Seshu,
>
>NiFi has been used extensively as an enterprise (global) wide dataflow
>tool.  It supports large teams of people with differing levels of
>authorization and access roles operating on the same cluster
>supporting vast numbers of different dataflows through the same
>system.  Though it has some considerable utility in a classic ETL
>sense it wasn't built for classic ETL cases necessarily.  It was built
>for the sensor/source to processing to database/warehouse/etc..
>problem on a really massive scale.  In many ways it replaces
>traditional ETL approaches and in others it compliments them.  We
>weren't really setting out to replace some particular system and
>specifically weren't inspired by the systems mentioned.  But rather we
>set out to fill a gap that we saw.  Specifically that is effective
>'dataflow' is not just 'data transport'.
>
>Regarding the open flow/save flow approach we definitely considered
>that.  We often refer to that as the 'design and deploy model'.  In
>many ways that is why we built nifi in the first place.  There is
>definitely value in that model.  But there is also a large dragging
>force it imposes which is it creates a significant disconnect from
>making a change and seeing its effect.  That often means slow
>integration activities and when errors occur it isn't easy to find
>root cause.  That model provides a sense of comfort as it is common
>and well known and fits a typical software development model.  But it
>doesn't necessarily reflect the operational needs that can occur which
>require prompt, reliable, verifiable changes to benefit the business.
>
>So the model NiFi supports is that of immediate/real-time changes.  We
>can then create templates of those flows, store them in a registry,
>and folks could share them.  There are additional things we can do to
>support the classic design and deploy model for the cases where it is
>truly essential.  And we're also working with folks to explain the
>value in moving away from that model when they can.  There is no
>single answer for sure but we needed a model that can support both
>sides of that story and that is what we have.  We've started from this
>base of realtime command and control and are adding support for the
>classic model.  But the classic model alone cannot support realtime.
>
>Let's keep the discussion going.  This is good stuff.  We know we can
>and should do more to support the classic view when critical but we
>want to really understand the 'why' behind it.  In some cases folks
>like it because they know that and in others it is truly critical.  We
>want to understand those truly critical cases.
>
>Thanks
>Joe
>
>
>
>On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <[email protected]>
>wrote:
>> Mark,
>>
>> Thanks for the response. Is Process Groups the only abstraction for
>> maintaining disparate flows, Did you consider the more traditional Open
>> Flow/Save Flow approach?
>>
>> If I  start thinking of NiFi as a replacement to enterprise ETL tools
>>like
>> Informatica/AbInito in the Hadoop world, I would introduce different
>> personas ³Administrator": Manages and monitors the flows, ³ETL
>>Developer²:
>> develops and deploys the flows etc and build an authorization model
>>around
>> it.
>>
>> It would definitely complicate the model, but would allow for an
>> enterprise wide deployment of NiFi. Would love to discuss more.
>>
>> Regards
>> Seshu Adunuthula
>>
>>
>> On 5/27/15, 2:18 PM, "Mark Payne" <[email protected]> wrote:
>>
>>>Seshu,
>>>
>>>Thanks for the e-mail and for sharing your concerns!
>>>
>>>So when we talk about combining multiple sources into a single flow, we
>>>don't mean that all data should be combined into a single flow. It
>>>absolutely makes sense to sometimes have very disparate flows! In some
>>>of
>>>the instances we've run, we have dozens or more disparate flows. The
>>>idea
>>>that I wanted to convey in the article is that just because 2 pieces of
>>>data come from different sources does not mean that they should be
>>>different flows. But if the data needs to be handled very differently
>>>then it absolutely should be two different flows. Those flows then can
>>>live side-by-side within the same instance of NiFi (generally in
>>>different Process Groups so that the graph is maintainable).
>>>
>>>The idea of how to handle security and authorization is definitely an
>>>ongoing debate. There are really two major approaches here. The first
>>>approach, which we offer today, is to have a separate instance of NiFi
>>>when different security and authorization is required. Remote Process
>>>Groups/site-to-site functionality is then used to send the data between
>>>flows. The rub here is that if you have many instances it can be
>>>different to manage them.
>>>
>>>The other approach would be to allow the security and authorization to
>>>take place at the Process Group level, rather than the Flow Controller
>>>level. This would be a very significant amount of work and may make the
>>>application more difficult to use, if the administrators then had to
>>>manage each group independently. So there are definitely trade-offs to
>>>each approach. If you have ideas about how you'd like to see it work,
>>>please share them so that we can make NiFi as useful as possible.
>>>
>>>Thanks
>>>-Mark
>>>
>>>----------------------------------------
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>>>Differently About DataFlow]
>>>> Date: Tue, 26 May 2015 20:48:16 +0000
>>>>
>>>> Hello Folks,
>>>>
>>>> Finally got to install NiFi and got the sample flows running and read
>>>>the Blog article at
>>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>>>
>>>>> The question was "Is it possible to have NiFi service setup and
>>>>>running and allow for multiple dataflows to be designed and deployed
>>>>>(running) at the same time?²
>>>>
>>>> I understand the argument being made by the author on how you can use
>>>>Nifi to have a single flow with several inputs compared to several
>>>>disparate flows. But there are multiple advantages to having Nifi
>>>>manage
>>>>several disparate flows.
>>>>
>>>> * Managing Flows that have very different transformations
>>>> * Security: Authorization, who has access to what flows, executing
>>>>flows as a named user instead of a super user.
>>>> * Resource Management: Scheduling the resources across disparate flows
>>>> * Etc
>>>>
>>>> Are there future plans to have Nifi Service setup and manage multiple
>>>>data flows?
>>>>
>>>> Regards
>>>> Seshu Adunuthula
>>>>
>>>>
>>>>
>>>
>>

Reply via email to