Aldrin, Good question. In this use case, the processor string looks something like this:
SoftNAS-GetFilesystem -> GetFileList -> SplitFileList -> GetFileData -> Remote Process Group(s) -> PutFileData The SoftNAS-GetFilesystem processor exposes auto-discovered filesystem resources (storage pools, volumes, mount points) and manually configured directories as connections. The user then connects one or more of these resources to GetFileList. GetFileList uses "find" to scan for all files (first time), then only sends changes thereafter (as output of "find", newline-delimited file and directory paths). It also includes a new attribute "file.rootdir", which specifies the root directory of the original files/directory paths for later use downstream. GetFileData takes each incoming file path, identifies whether it's a "file" vs. a "directory", then reads the file data and sends it onward (directories just get permissions sent). PutFileData accepts the incoming stream of files and directory paths, recreates the directory tree and writes the files, setting permissions and ownership along the way - resulting in an exact replica of the original directories). GetFileList is configured by default with a static root directory, like the original GetFile; except it's optional to specify a root directory. If one is not specified, GetFileList takes each "property/trigger" event and sets the root directory property for that request, processes the request on behalf of that incoming connection. The GetFileList through PutFileData is working today, thanks to everyone's help over the past couple of weeks! SoftNAS-GetFilesystem automatically discovers filesystems that are available for replication, exposing each as a connection. For each "connected" outgoing relationship, it will send a trigger event with its directory path on whatever basis it has been scheduled to run. This drives the replication processes. Beyond this initial use case, we need to implement API's for managing all of this, creating process groups from replication templates (by calling the Nifi REST API) as our own "factory". Our user-interface and other API's need the ability to trigger things like a "Force sync" command (initiated by the user of our console), which will flow into GetFileList as yet another opcode, which tells it to reset and send everything (not just the changes) in order to resync both ends. So we have big plans for use of opcodes for all kinds of dynamic processors, including things we can't even imagine today, but I'm sure will surface as we go. Today, Nifi is fantastic for statically-configured data flow processing (it's original design scope), where the user knows a priori how to configure everything (manually). We need to automate these configuration tasks, as well as integrate back into other areas of our product that can interact with our Nifi processors. The addition of dynamic property settings and a von neuman style processor paradigm makes it so we can do just about anything we need for not only managing data flows, but also control flows. Rick -----Original Message----- From: Aldrin Piri [mailto:[email protected]] Sent: Monday, September 14, 2015 12:51 PM To: [email protected] Subject: Re: Using output relationships for user-selection and dynamic property change of another processor Rick, At first read, my particular instinct feels that heading down the attribute path may better serve you than connections. What I am interested in hearing is the next step of these flowfiles with associated opcodes. To another processor? To many processors? As to 1, there should not be an issue with this. From personal experience, I've had processors that have had a routing config of sorts for various types of data each driving a separate relationship in similar numbers as yourself. Per 2, the other thread covers the lack of the auto-termination, but the framework will handle disabling termination on dragging a relationship. If something was previously auto-terminated but now that relationship is connected to another component, that "checkbox" is removed. 3: Would like to hear additional details about where files go from here, but what you may be after could be an emerging pattern of use. Need to see how this differentiates itself from the property/attribute approach. Thanks, Aldrin On Mon, Sep 14, 2015 at 9:09 AM, Rick Braddy <[email protected]> wrote: > Hi, > > > We have a situation whereby we want a processor that automatically > discovers some resources (e.g., filesystems), and then auto-publishes > a number of available relationships as output connections from the > processor to be used to configure other processors (one processor > configuring another at run-time). This will facilitate dynamic > configuration via drag and drop for our customers. > > > Then, the user can choose one or more of those outputs and connect to > other processors. The output of the first processor will be used to > dynamically configure the 2nd processor; e.g., by altering a property > in the 2nd processor. > > > To do this in a general-purpose way, in the 2nd processor, we're > considering implementing a "von neuman" style opcode/operand paradigm, > whereby the 2nd processor receives an opcode from the first processor, > then executes that operation. If the opcode requires one or more > "operands", the processor will retrieve those either from other > FlowFile attributes or the FlowFile contents. We have successfully > used this approach for many years in our product and it has proven to > be extremely flexible and extensible with time (as has the original > von neuman architecture, of course). > > > Example opcodes we're considering starting with are: > > > - OPCODE_TRIGGER - simply triggers processing of the target processor > (operands vary based upon the type of processor being event-triggered) > > - OPCODE_PROPERTY_CHANGE - dynamically modifies the contents of a > property in the target processor, with additional attribute > OPERAND_PROPERTY (no further flowfile processing occurs, just property > change) > > - OPCODE_PROPERTY_TRIGGER - dynamically modifies the contents of > property, then triggers the processor to process incoming flowfiles as > usual (providing an atomic modify/run operation with a single > flowfile) > > > > So, my questions are: > > > 1. Should we expect any issues with dynamically-created many > (potentially > dozens) of output relationships? > > > 2. Can we auto-terminate unused connections (or rather, unterminate > them when user selects an output? > > > 3. We can implement this in each of our processors' onTrigger() > methods, but it would be better if this sort of capability were > ultimately part of the framework itself. Is there any interest in > seeing this style processor supported with standard and user-definable > opcodes within the framework itself? > > > 4. Is there another, better way to achieve what we are trying to do > that we have overlooked? > > > As always, thanks in advance for your thoughts and guidance. > > > Rick > > > >
