Seshu,

NiFi has been used extensively as an enterprise (global) wide dataflow
tool.  It supports large teams of people with differing levels of
authorization and access roles operating on the same cluster
supporting vast numbers of different dataflows through the same
system.  Though it has some considerable utility in a classic ETL
sense it wasn't built for classic ETL cases necessarily.  It was built
for the sensor/source to processing to database/warehouse/etc..
problem on a really massive scale.  In many ways it replaces
traditional ETL approaches and in others it compliments them.  We
weren't really setting out to replace some particular system and
specifically weren't inspired by the systems mentioned.  But rather we
set out to fill a gap that we saw.  Specifically that is effective
'dataflow' is not just 'data transport'.

Regarding the open flow/save flow approach we definitely considered
that.  We often refer to that as the 'design and deploy model'.  In
many ways that is why we built nifi in the first place.  There is
definitely value in that model.  But there is also a large dragging
force it imposes which is it creates a significant disconnect from
making a change and seeing its effect.  That often means slow
integration activities and when errors occur it isn't easy to find
root cause.  That model provides a sense of comfort as it is common
and well known and fits a typical software development model.  But it
doesn't necessarily reflect the operational needs that can occur which
require prompt, reliable, verifiable changes to benefit the business.

So the model NiFi supports is that of immediate/real-time changes.  We
can then create templates of those flows, store them in a registry,
and folks could share them.  There are additional things we can do to
support the classic design and deploy model for the cases where it is
truly essential.  And we're also working with folks to explain the
value in moving away from that model when they can.  There is no
single answer for sure but we needed a model that can support both
sides of that story and that is what we have.  We've started from this
base of realtime command and control and are adding support for the
classic model.  But the classic model alone cannot support realtime.

Let's keep the discussion going.  This is good stuff.  We know we can
and should do more to support the classic view when critical but we
want to really understand the 'why' behind it.  In some cases folks
like it because they know that and in others it is truly critical.  We
want to understand those truly critical cases.

Thanks
Joe



On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <[email protected]> wrote:
> Mark,
>
> Thanks for the response. Is Process Groups the only abstraction for
> maintaining disparate flows, Did you consider the more traditional Open
> Flow/Save Flow approach?
>
> If I  start thinking of NiFi as a replacement to enterprise ETL tools like
> Informatica/AbInito in the Hadoop world, I would introduce different
> personas ³Administrator": Manages and monitors the flows, ³ETL Developer²:
> develops and deploys the flows etc and build an authorization model around
> it.
>
> It would definitely complicate the model, but would allow for an
> enterprise wide deployment of NiFi. Would love to discuss more.
>
> Regards
> Seshu Adunuthula
>
>
> On 5/27/15, 2:18 PM, "Mark Payne" <[email protected]> wrote:
>
>>Seshu,
>>
>>Thanks for the e-mail and for sharing your concerns!
>>
>>So when we talk about combining multiple sources into a single flow, we
>>don't mean that all data should be combined into a single flow. It
>>absolutely makes sense to sometimes have very disparate flows! In some of
>>the instances we've run, we have dozens or more disparate flows. The idea
>>that I wanted to convey in the article is that just because 2 pieces of
>>data come from different sources does not mean that they should be
>>different flows. But if the data needs to be handled very differently
>>then it absolutely should be two different flows. Those flows then can
>>live side-by-side within the same instance of NiFi (generally in
>>different Process Groups so that the graph is maintainable).
>>
>>The idea of how to handle security and authorization is definitely an
>>ongoing debate. There are really two major approaches here. The first
>>approach, which we offer today, is to have a separate instance of NiFi
>>when different security and authorization is required. Remote Process
>>Groups/site-to-site functionality is then used to send the data between
>>flows. The rub here is that if you have many instances it can be
>>different to manage them.
>>
>>The other approach would be to allow the security and authorization to
>>take place at the Process Group level, rather than the Flow Controller
>>level. This would be a very significant amount of work and may make the
>>application more difficult to use, if the administrators then had to
>>manage each group independently. So there are definitely trade-offs to
>>each approach. If you have ideas about how you'd like to see it work,
>>please share them so that we can make NiFi as useful as possible.
>>
>>Thanks
>>-Mark
>>
>>----------------------------------------
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>>Differently About DataFlow]
>>> Date: Tue, 26 May 2015 20:48:16 +0000
>>>
>>> Hello Folks,
>>>
>>> Finally got to install NiFi and got the sample flows running and read
>>>the Blog article at
>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>>
>>>> The question was "Is it possible to have NiFi service setup and
>>>>running and allow for multiple dataflows to be designed and deployed
>>>>(running) at the same time?²
>>>
>>> I understand the argument being made by the author on how you can use
>>>Nifi to have a single flow with several inputs compared to several
>>>disparate flows. But there are multiple advantages to having Nifi manage
>>>several disparate flows.
>>>
>>> * Managing Flows that have very different transformations
>>> * Security: Authorization, who has access to what flows, executing
>>>flows as a named user instead of a super user.
>>> * Resource Management: Scheduling the resources across disparate flows
>>> * Etc
>>>
>>> Are there future plans to have Nifi Service setup and manage multiple
>>>data flows?
>>>
>>> Regards
>>> Seshu Adunuthula
>>>
>>>
>>>
>>
>

Reply via email to