Definitely appreciate having as much "control" as possible afforded to each
flowfile. The use cases described here are spot on and I've hit this myself
previously. Any endpoint definition would ideally be configurable from the
flowfile itself via expression language. It's easy enough to hard code a
static endpoint value (maybe the majority use case), but it also doesn't
hurt to enable the flexibility of reading configuration from flowfile
attributes. So on principle and generally as a design aesthetic, this is a
great idea.

The hard part is that the internals for many processors are anchored to a
specific endpoint. For example, imagine a messaging service that requires a
heavy-weight client to be constructed. Typically, a processor might only
manage a single connection object to the remote service. The lifecycle of
the processor creates/destroys the underlying client, and must initialize
without input values coming from flowfile attributes.

It's these cases that will be harder to refactor. In theory (using my
example), the processor could maintain a pool of connections to different
remote service locations, caching each based on the hostname of the remote
service (or whatever). It's of course possible to create these heavy
connection objects "on demand" based on the attributes of the flowfile
being processed, caching them inside the processor, expiring them after a
period of time, etc. But it adds to the burden that the processor must
maintain (in terms of lines of code and/or complexity of the processor) and
might have effects on resource allocation (memory allocated per client,
etc.).

So that's really the tension here. The programming model of the processor,
in many cases, makes it "cleaner" to maintain a single connection facility,
which is why the configuration is a bit more stringent and why many
processors don't enable this more dynamic capability.

But definitely, any processors which can support a dynamic configuration
model, where flowfile attributes are used to make remote connections, those
should be the low hanging fruit to make changes to. And then past that,
probably any other processor will just need to be evaluated and considered
for a more sophisticated "on demand" approach for creating or maintaining
its internal clients or components.

On Thu, Oct 20, 2022 at 9:26 AM Kevin Doran <kdo...@apache.org> wrote:

> Hi Rogier,
>
> Thanks for your message. This is an interesting use case. In a way it
> inverts the typical use of NiFi, which is where the flow files are the data
> being moved and the flow logic is in the flow definition / processor
> config. Instead this puts the parameters of the job/workflow into the
> flowfile, which presumably gets enhanced as it moves through the flow so
> you end up with one object/document containing your workflow parameters and
> data/results. Is that accurate?
>
> I understand correctly, this sounds like a workflow orchestration problem,
> which is similar to but has some subtle differences from data flow
> management. There are tools that try to solve workflow orchestration. Two
> that come to mind are Conductor [1] and Cadence [2]. NiFi can do this, and
> I see plenty of flows that use flowfile attributes to store some control
> signals or values needed for flow logic. But because its not the core use
> case, I think NiFi developers / extension authors don't think of it when
> building components, which is why they don't think of enabling expression
> language on certain properties.
>
> This isn't really a response to whether NiFi should / should not add
> broader expression language support to properties nor is it an opinion on
> wheterh NiFi should or should not try to serve the needs of workflow
> orchestration / job execution. Others on this list may have opinions on
> that. I'm just offering my perspective on why this isn't already the case.
> AFAIK, in many cases adding EL support to processor properties is a fairly
> straightforward effort; the challenge, as you point out, is applying it
> broadly to all our existing processors (and new processors as they get
> developed) rather than just one or two.
>
> [1] https://conductor.netflix.com/
> [2] https://cadenceworkflow.io/
>
> Cheers,
> Kevin
>
> On Oct 18, 2022 at 06:05:14, TIMMERMANS Rogier <
> rogier.timmerm...@contractor.voo.be> wrote:
>
> > Hello,
> >
> >
> >
> > Apologies in advance if this is the wrong list to send this type of query
> > to.
> >
> > After short discussion with Chris Sampson on
> > https://issues.apache.org/jira/browse/NIFI-8214 he proposed to send out
> > email to this list; I hope it finds you well.
> >
> >
> >
> > We (several of my colleagues) sometimes use a pattern where we build a
> > nifi flow that gets initiated by a short json configuration file; the
> > initial input file (or generated flowfile) contains simple configuration
> > data for the rest of the flow and sets up things like Remote paths,
> users,
> > testcase IDs, endpoints to hit, etc… as a file is easier to manage,
> > maintain and archive than a graphical set of linked components with
> logic.
> >
> > This setup allows relative easy building of a generalized graphical logic
> > flow with (less issue below) minimal branching as ‘controlling where
> stuff
> > goes or talks to’ is effectively on the input file.
> > With some processing and splitting upfront each flowfile will effectively
> > go down it’s intended path and carry the needed values as attributes for
> > downstream usage where they act as dynamic configuration/parameters (or
> > whatever terminology you wish to apply) and we are also able to use
> > calculations/logic to pick output locations, etc. To make it more
> concrete
> > just imagine a use case where a simple attribute value can decide using a
> > production system or a test system endpoint where all logic is identical
> > but the only difference will be an input and/or output location. Granted
> > the example is simple but it does demonstrate how this can get out of
> hand
> > with multiple branching locations and duplicated components.
> >
> > We do face the issue that several NiFi components do not support
> > Attributes (or expression language) on several key configuration
> properties
> > (some typical example components with this limitation: FTP components,
> > ElasticPut, and there surely will be a few others based off the similar
> > base-classes which I haven’t used/found yet); this forces us into
> building
> > dedicated routes+ duplicated pattern of components because a simple
> > destination URI or an Remote Input/Output path cannot be dynamically
> > adjusted but only be set through more pre-defined constants (parameters
> or
> > through VarRegistry).
> > This reduces flexibility quite a bit and introduces a lot of complexity
> to
> > the graphical view as this obviously means more graphical clutter on the
> > worksheet, a lot more connection lines, additional branching, etc…
> > A minor change in logic or a property needs to be touched in several
> > places making this relatively error-prone to maintain as well.
> >
> >
> >
> > I’d like to propose to extend – by default – fields like: Server
> > endpoints, ports , Remote Paths, Input Paths (all these similar type of
> > fields which are limited in several components) - with a default
> capability
> > of using attributes/expression language (and by extension parameters and
> > the variable registry); this will increase flexibility for many
> components
> > and could de-dupe a lot of graphical flow logic and components.
> >
> >
> >
> >
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > <http://www.voo.be/>
> >
> > *Rogier Timmermans*
> >
> > *Lead Engineer VOD & OTT*
> >
> > 46 Rue Jean Jaures, B-4030 Ans
> >
> > Tel: +31 (0) 6 5428 4192
> >
> > rogier.timmerm...@contractor.voo.be
> >
> >
> >
> >
> >
> >
> > Ce message transmis par voie électronique ainsi que toutes ses annexes
> > contiennent des informations qui peuvent être confidentielles ou
> protégées.
> > Ces informations sont uniquement destinées à l’usage des personnes ou des
> > entités précisées dans les champs ‘A’, ‘Cc’ et ‘Cci’. Si vous n’êtes pas
> > l’un de ces destinataires, soyez conscient que toute forme, partielle ou
> > complète, de divulgation, copie, distribution ou utilisation de ces
> > informations est strictement interdite. Si vous avez reçu ce message par
> > erreur, veuillez nous en informer par téléphone ou par message
> électronique
> > et détruire les informations immédiatement. Ce message n’engage que son
> > signataire et aucunement son employeur.
> >
>

Reply via email to