RE: Using output relationships for user-selection and dynamic property change of another processor

Rick Braddy Mon, 14 Sep 2015 11:53:04 -0700

Aldrin,

Good question.  In this use case, the processor string looks something like 
this:

 SoftNAS-GetFilesystem -> GetFileList -> SplitFileList -> GetFileData -> Remote 
Process Group(s) -> PutFileData

The SoftNAS-GetFilesystem processor exposes auto-discovered filesystem 
resources (storage pools, volumes, mount points) and manually configured 
directories as connections.  The user then connects one or more of these 
resources to GetFileList.  GetFileList uses "find" to scan for all files (first 
time), then only sends changes thereafter (as output of "find", 
newline-delimited file and directory paths).  It also includes a new attribute 
"file.rootdir", which specifies the root directory of the original 
files/directory paths for later use downstream.  GetFileData takes each 
incoming file path, identifies whether it's a "file" vs. a "directory", then 
reads the file data and sends it onward (directories just get permissions 
sent).  PutFileData accepts the incoming stream of files and directory paths, 
recreates the directory tree and writes the files, setting permissions and 
ownership along the way - resulting in an exact replica of the original 
directories).

GetFileList is configured by default with a static root directory, like the 
original GetFile; except it's optional to specify a root directory.  If one is 
not specified, GetFileList takes each "property/trigger" event and sets the 
root directory property for that request, processes the request on behalf of 
that incoming connection.  The GetFileList through PutFileData is working 
today, thanks to everyone's help over the past couple of weeks!

SoftNAS-GetFilesystem automatically discovers filesystems that are available 
for replication, exposing each as a connection.  For each "connected" outgoing 
relationship, it will send a trigger event with its directory path on whatever 
basis it has been scheduled to run.  This drives the replication processes.

Beyond this initial use case, we need to implement API's for managing all of 
this, creating process groups from replication templates (by calling the Nifi 
REST API) as our own "factory".  Our user-interface and other API's need the 
ability to trigger things like a "Force sync" command (initiated by the user of 
our console), which will flow into GetFileList as yet another opcode, which 
tells it to reset and send everything (not just the changes) in order to resync 
both ends.

So we have big plans for use of opcodes for all kinds of dynamic processors, 
including things we can't even imagine today, but I'm sure will surface as we 
go.

Today, Nifi is fantastic for statically-configured data flow processing (it's 
original design scope), where the user knows a priori how to configure 
everything (manually).  We need to automate these configuration tasks, as well 
as integrate back into other areas of our product that can interact with our 
Nifi processors.  The addition of dynamic property settings and a von neuman 
style processor paradigm makes it so we can do just about anything we need for 
not only managing data flows, but also control flows.

Rick

-----Original Message-----
From: Aldrin Piri [mailto:[email protected]] 
Sent: Monday, September 14, 2015 12:51 PM
To: [email protected]
Subject: Re: Using output relationships for user-selection and dynamic property 
change of another processor

Rick,

At first read, my particular instinct feels that heading down the attribute 
path may better serve you than connections.  What I am interested in hearing is 
the next step of these flowfiles with associated opcodes.  To another 
processor?  To many processors?

As to 1, there should not be an issue with this.  From personal experience, 
I've had processors that have had a routing config of sorts for various types 
of data each driving a separate relationship in similar numbers as yourself.

Per 2, the other thread covers the lack of the auto-termination, but the 
framework will handle disabling termination on dragging a relationship.  If 
something was previously auto-terminated but now that relationship is connected 
to another component, that "checkbox" is removed.

3:  Would like to hear additional details about where files go from here, but 
what you may be after could be an emerging pattern of use.  Need to see how 
this differentiates itself from the property/attribute approach.

Thanks,
Aldrin

On Mon, Sep 14, 2015 at 9:09 AM, Rick Braddy <[email protected]> wrote:

> Hi,
>
>
> We have a situation whereby we want a processor that automatically 
> discovers some resources (e.g., filesystems), and then auto-publishes 
> a number of available relationships as output connections from the 
> processor to be used to configure other processors (one processor 
> configuring another at run-time).  This will facilitate dynamic 
> configuration via drag and drop for our customers.
>
>
> Then, the user can choose one or more of those outputs and connect to 
> other processors.  The output of the first processor will be used to 
> dynamically configure the 2nd processor; e.g., by altering a property 
> in the 2nd processor.
>
>
> To do this in a general-purpose way, in the 2nd processor, we're 
> considering implementing a "von neuman" style opcode/operand paradigm, 
> whereby the 2nd processor receives an opcode from the first processor, 
> then executes that operation.  If the opcode requires one or more 
> "operands", the processor will retrieve those either from other 
> FlowFile attributes or the FlowFile contents. We have successfully 
> used this approach for many years in our product and it has proven to 
> be extremely flexible and extensible with time (as has the original 
> von neuman architecture, of course).
>
>
> Example opcodes we're considering starting with are:
>
>
> - OPCODE_TRIGGER - simply triggers processing of the target processor 
> (operands vary based upon the type of processor being event-triggered)
>
> - OPCODE_PROPERTY_CHANGE - dynamically modifies the contents of a 
> property in the target processor, with additional attribute 
> OPERAND_PROPERTY (no further flowfile processing occurs, just property 
> change)
>
> - OPCODE_PROPERTY_TRIGGER - dynamically modifies the contents of 
> property, then triggers the processor to process incoming flowfiles as 
> usual (providing an atomic modify/run operation with a single 
> flowfile)
>
>
>
> So, my questions are:
>
>
> 1. Should we expect any issues with dynamically-created many 
> (potentially
> dozens) of output relationships?
>
>
> 2. Can we auto-terminate unused connections (or rather, unterminate 
> them when user selects an output?
>
>
> 3. We can implement this in each of our processors' onTrigger() 
> methods, but it would be better if this sort of capability were 
> ultimately part of the framework itself.  Is there any interest in 
> seeing this style processor supported with standard and user-definable 
> opcodes within the framework itself?
>
>
> 4. Is there another, better way to achieve what we are trying to do 
> that we have overlooked?
>
>
> As always, thanks in advance for your thoughts and guidance.
>
>
> Rick
>
>
>
>

RE: Using output relationships for user-selection and dynamic property change of another processor

Reply via email to