Hello You will first want to create a JIRA describing the work/idea being done. Then in the commit log be sure to reference NIFI-XXXX.
Take a look here for a helpful guide on how best to help the community land contributions. https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide Thanks Joe On Tue, Jan 31, 2017 at 10:17 AM, Irizarry Jr., Nazario <[email protected]> wrote: > I am about to submit a PR for an implementation of the run-once scheduling. > There is no outstanding JIRA ticket on this so what kind of NIFI-XXXX or > other labeling should I put into the title of the PR? > > Thanks, > > Naz Irizarry > MITRE Corp. > 617-893-0074 > > > >> On Jan 12, 2017, at 3:55 PM, Irizarry Jr., Nazario <[email protected]> wrote: >> >> I think it is a matter of the model in one's head. If one thinks of a >> continuous activation paradigm the green arrow versus red square indicate >> what you point out. On the other hand in an ad-hoc run-once paradigm the >> green arrow is a nice succinct indicator of what has not run yet. In an >> analytics environment processing can take minutes to hours for some >> processors. As processing goes on the processors with the remaining green >> arrows indicate what is left to complete in the “visual script.” >> >> Consider the following example. Say there there are five processors. The >> first processor, say A, makes a query and gets data. Depending on what I >> know about today’s input to A the output should be directed to B1, B2, B3, >> or B4. The B's are actually variations on a particular analytic algorithm >> and most of the time only one of them needs to be used. On one day (based >> on external knowledge) I click on A and B1 and then the Start arrow. On >> another day I modify the query, click on A and B2 and then click on the >> Start arrow. etc, Clearly I could have four flows and I could start/stop >> entire flows. But, as the number of processing stages increases and the >> number of processing alternatives increases at each stage the combinatorial >> growth makes distinct flows painful to manage. Sometimes it is easier to >> have one all encompassing flow and then allow the analyst to shift click the >> portions they want to invoke for the next “run." >> >> >> Naz Irizarry >> MITRE Corp. >> 617-893-0074 >> >> >> >>> On Jan 12, 2017, at 2:14 PM, Joe Witt <[email protected]> wrote: >>> >>> Naz >>> >>> The green arrow vs red square says "scheduled to execute" vs "not >>> scheduled to execute". For most processors, such as those which take >>> input flow files from a connection, even if they're scheduled to run >>> they're not going to be executed unless there is work to do (data >>> sitting in the queue) and space available (on all destination >>> relationships). Because of this I'm suggesting to consider just >>> leaving them all scheduled to execute even though they won't actually >>> be doing anything most of the time. The stats on each component tell >>> you how many times it was actually invoked and how much data it >>> processed, etc.. So you'll see that they're not doing anything most >>> of the time. >>> >>> You mentioned not wanting to have to do anything manual yet run once >>> would be a manual construct, right? >>> >>> I dont mean to suggest I'm closed off to the idea of a run once >>> concept I just really want to understand your use case better. >>> >>> Thanks >>> Joe >>> >>> On Thu, Jan 12, 2017 at 2:11 PM, Irizarry Jr., Nazario <[email protected]> >>> wrote: >>>> Correction, that was the processor scheduler’s stopProcessor() method that >>>> needs to be invoked so the UI shows that the processor is stopped. >>>> >>>> Naz Irizarry >>>> MITRE Corp. >>>> 617-893-0074 >>>> >>>> >>>> >>>>> On Jan 12, 2017, at 2:08 PM, Irizarry Jr., Nazario <[email protected]> wrote: >>>>> >>>>> Yes, we found that to keep the UI in sync (make sure it looks stopped >>>>> after it runs once) the flow controller's stopProcessor() method has to >>>>> be called. >>>>> >>>>> Naz Irizarry >>>>> MITRE Corp. >>>>> 617-893-0074 >>>>> >>>>> >>>>> >>>>> On Jan 12, 2017, at 1:41 PM, Brandon DeVries >>>>> <[email protected]<mailto:[email protected]>> wrote: >>>>> >>>>> I think answering Joe's question is step one. However, I was curious and >>>>> tried something: >>>>> >>>>> public void onTrigger(...){ >>>>> if(!isSheduled()){ >>>>> return; >>>>> } >>>>> FlowFile flowFile = session.get() >>>>> if (flowFile == null){ >>>>> return; >>>>> } >>>>> session.transfer(flowFile, REL_SUCCESS); >>>>> updateScheduledFalse(); >>>>> } >>>>> >>>>> This processes one FlowFile per "scheduling". I.e., one FlowFile goes >>>>> through, and you need to stop / start to get another. I'm not 100% that >>>>> nothing else would ever call the "built in" updateScheduled* methods, but >>>>> worst case the processor could maintain its own flag. Also, for what it's >>>>> worth, calling updateScheduledFalse() doesn't "stop" the processor on the >>>>> graph... as Oleg mentions, this (or something like this) could potentially >>>>> be visually confusing. >>>>> >>>>> I'm not sure how this fits in a production system, but this + >>>>> GenerateFlowFile and some backpressure seems possibly useful for >>>>> debugging. I know I've faked this behavior with a GenerateFlowFile w/ run >>>>> schedule "1 day" or something before... then again, maybe it would be >>>>> best >>>>> to not create something that could be confusing / misused in a production >>>>> system. >>>>> >>>>> Brandon >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Jan 12, 2017 at 1:02 PM Joe Witt >>>>> <[email protected]<mailto:[email protected]>> wrote: >>>>> >>>>> Naz, >>>>> >>>>> Why not just leave all the processes running? If the data only >>>>> arrives periodically that is ok, right? >>>>> >>>>> Thanks >>>>> Joe >>>>> >>>>> On Thu, Jan 12, 2017 at 10:54 AM, Irizarry Jr., Nazario >>>>> <[email protected]<mailto:[email protected]>> >>>>> wrote: >>>>> On a project that I am on we have been looking at using NiFi for >>>>> orchestrations that are invoked infrequently. For example, once a month a >>>>> new data input product becomes available and then one wants to run it >>>>> through a set of processing steps that can be nicely implemented using >>>>> NiFi >>>>> processors. However, using the interval or cron scheduling for this >>>>> purpose begins to get cumbersome after a while with the need to start and >>>>> manually stop these occasional flows. >>>>> >>>>> It would be fairly easy to add an additional scheduling option - “Run >>>>> Once” for this use case. The behavior would be that when a processor is >>>>> set to run once it automatically stops after it has successfully processed >>>>> one input. >>>>> >>>>> What do people think? We are willing to implement this small >>>>> enhancement. >>>>> >>>>> Cheers, >>>>> >>>>> Naz Irizarry >>>>> MITRE Corp. >>>>> 617-893-0074 <(617)%20893-0074> >>>>> >>>> >>> >> >
