Re: NiFi code re-use

Mark Payne Sun, 13 May 2018 13:19:58 -0700

So I think we have a lot of different concepts going on here. I’ll try to 
provide my thoughts on each one as I’ve spent a good bit of timing thinking 
about each of them over the last year or two :)

Wormhole connections: these would be very nice to have because it would allow 
us to avoid having lots of ports to go further up and down the stacks of 
process groups. But I don’t know that it would adequate scratch the itch for 
functional groups.

Functional groups: I was very gung-ho about implementing these a while back. 
Then I realized 2 really big issues with this. Firstly, if one group suddenly 
floods the functional group with data, then it can cause backlogs that could 
hinder processing of the rest of the flow, though they are otherwise completely 
independent. Not the end of the world and similar to how a single microservice, 
if overwhelmed would do the same thing in a microservice architecture. More 
importantly is the idea of “what happens if we try to merge data?” So a 
MergeRecord processor, for instance. It’s not a 1-in-1-out type of thing. Can 
flowfiles from different sources be merged? Should we allow merging at all? I 
would be a bit worried that this would lead to a lot of confusion. Doesn’t mean 
that it can’t be done but we would have to figure out what the semantics are 
for such a thing and how that would be conveyed clearly in the UI.

Load-Balanced connections (aka spread the flowfiles across all nodes in the 
cluster on a given connection): very much agree and think we should do this.

Non-root-group remote ports: absolutely agree that this is a good idea and we 
should do this as well.

Auto Updates of flows from flow registry: definitely all for this as well. I 
believe that if we do this, then it would subsume the need for the functional 
groups and would be much easier to understand and configure from the UI. It 
would also provide far more power and flexibility by providing the ability to 
upgrade all instances of a flow across many different clusters if desired, not 
just the cluster that you’re working on.

Hopefully this provides some color into some of the design choices that have 
been made and will help to spur more thoughts on the subjects.

-Mark

Sent from my iPhone

On May 13, 2018, at 3:32 PM, Ed B <[email protected]<mailto:[email protected]>> 
wrote:

Joe, Aldrin,
Wormholes is pretty interesting thing. I played around with that and could make 
it working. Though, this approach has downsides.
I'll create an article for this, but you can take a look at it now (attaching 
template for root canvas).

So, what I've found while playing around this topic, is that removing 
restrictions for remote input/output port being on root canvas only would be 
nice, but not sufficient.
When we distribute flowfiles over the nodes within the same cluster - we need 
to make it easy to indicate, so RPG will be using properties of the cluster, 
instead of manually provided ones. I would even go further to add distribution 
capabilities on relationship level. That would really reduce amount of entities 
we put into our flows, and reduce complexity.

On Sun, May 13, 2018 at 1:20 PM Aldrin Piri 
<[email protected]<mailto:[email protected]>> wrote:
I think what you highlighted is kind of how I had it worked out in my
mind.  Although maybe I read too much into the description of the proposal
about the framework managing context.  In terms of what we have now, I
think I pictured this to be "Tag this data as from this source" and then
when leaving such a group, the framework would send it back to that "tag."

I will avoid showing my blissful ignorance of all the internals by saying
how it could work but will try to draw the analogs from functionality
currently in place.  I imagined feeding the reference-able group similar to
a virtual funnel of sorts where we use framework knowledge of the
connection to it (and perhaps said connection's source) to track that state
in shipping it back via some slightly smarter port that is, in effect, a
router back to virtual ports (wormholes?) to where the data came from.  Or,
perhaps, in more concrete terms:

We have
* a Process Group has several input ports (source processors),
* that all feed an UpdateAttribute which tags each flowfile as the source
via EL,
* carry out the functions of the referenceable group,
* with the end of this "block" feeding a  RouteOnAttribute on this tag to
an equivalent number of output ports.

On Sun, May 13, 2018 at 12:20 PM, Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:

> Aldrin
>
> Referencable groups would have to work like a single instance of a PG in
> terms of flow definition but caller specific instances in reality.
> Otherwise youd have no way to avoid cross contaminating flowfiles from
> various callers as thered be no caller specific stack (in our case caller
> specific queues and other resources).
>
> The point about keeping versions of instances up to date with registry
> based versioned instances is true but can be addressed with auto updating
> instances of versioned flows which we will need to add anyway.
>
> In either case having PG operate like a callable function reusable across
> flows will likely need to operate as mentioned above.  The former being
> less consistent with the user experience and more work than the latter.
>
> Do you see some other way to make referencable groups work.
>
> Wormhole connections need to be implemented for sure to help keep flows
> concise.
>
> Thanks
> Joe
>
> On Sun, May 13, 2018, 11:42 AM Aldrin Piri 
> <[email protected]<mailto:[email protected]>> wrote:
>
> > I think the Registry solves part of the issue but even that would lead to
> > duplication of units where we are "copying and pasting" the "code."
> > Versioning would aid in keeping all components in lock step, but will not
> > remedy manual intervention with n-many instances of them.  After one was
> > altered, there would still be the manual process where the PGs would each
> > need to be updated when that change was committed and changes were
> realized
> > after some time delta.
> >
> > I think the previously discussed Reference-able Process Groups [1] are
> > likely better aligned in conjunction with the Wormhole Connections [2].
> >
> > [1] https://cwiki.apache.org/confluence/display/NIFI/
> > Reference-able+Process+Groups
> > [2] https://cwiki.apache.org/confluence/display/NIFI/Wormhole+Co
> nnections
> >
> >
> >
> > On Sat, May 12, 2018 at 10:19 PM, Joe Witt 
> > <[email protected]<mailto:[email protected]>> wrote:
> >
> > > Scott
> > >
> > > Youre very right there must be a better way.  The flow registry with
> > > versioned flows is the answer.  You can version control the common
> logic
> > > and reuse it in as many instances as you need.
> > >
> > > This is like having a flow Class in java terms where you can
> instantiate
> > as
> > > many objects of that type Class you need.
> > >
> > > It was definitely a long missing solution that was addressed in nifi
> > 1.5.0
> > > and with the flow registry.
> > >
> > > Also, we should just remove the root group remote port limitation.  It
> > was
> > > an implementation choice long before we had multi tenant auth and now
> it
> > no
> > > longer makes sense to force root group only.  Still though the above
> > > scenario of versioned flows and the flow registry solves the main
> > problem.
> > >
> > >
> > > thanks
> > >
> > > On Sat, May 12, 2018, 9:22 PM Charlie Meyer <
> > > [email protected]<mailto:[email protected]>>
> > >  wrote:
> > >
> > > > We do this often by leveraging the variable registery and the
> > expression
> > > > language to make components be more dynamic and reusable
> > > >
> > > > -Charlie
> > > >
> > > > On Sat, May 12, 2018, 20:01 scott 
> > > > <[email protected]<mailto:[email protected]>> wrote:
> > > >
> > > > > Hi Devs,
> > > > >
> > > > > I've got a question about an observation I've had while working
> with
> > > > > NiFi. Is there a better way to re-use process groups similar to how
> > > > > programming languages reference functions, libraries, classes, or
> > > > > pointers. I know about remote process groups and templates, but
> > neither
> > > > > do exactly what I was thinking. RPGs are great, but I think the
> > output
> > > > > goes to the root canvas level, and you have to have have connectors
> > all
> > > > > the way back up your flow hierarchy, and that's not practical.
> > > > > Ultimately, I'm looking for an easy way to re-use process groups
> that
> > > > > contain common logic in many of my flows, so that I reduce the
> amount
> > > of
> > > > > places I have to change.
> > > > >
> > > > > Hopefully that made sense. Appreciate your thoughts.
> > > > >
> > > > > Scott
> > > > >
> > > > >
> > > >
> > >
> >
>
<Wormholes_in_NIFI.xml>

Re: NiFi code re-use

Reply via email to