Re: [DISCUSS] Run Once scheduling

Irizarry Jr., Nazario Tue, 31 Jan 2017 07:18:42 -0800

I am about to submit a PR for an implementation of the run-once scheduling.  
There is no outstanding JIRA ticket on this so what kind of NIFI-XXXX or other 
labeling should I put into the title of the PR?


Thanks,

Naz Irizarry
MITRE Corp.
617-893-0074



> On Jan 12, 2017, at 3:55 PM, Irizarry Jr., Nazario <[email protected]> wrote:
> 
> I think it is a matter of the model in one's head.  If one thinks of a 
> continuous activation paradigm the green arrow versus red square indicate 
> what you point out.  On the other hand in an ad-hoc run-once paradigm the 
> green arrow is a nice succinct indicator of what has not run yet.  In an 
> analytics environment processing can take minutes to hours for some 
> processors.  As  processing goes on the processors with the remaining green 
> arrows indicate what is left to complete in the “visual script.”
> 
> Consider the following example. Say there there are five processors. The 
> first processor, say A, makes a query and gets data.  Depending on what I 
> know about today’s input to A the output should be directed to B1, B2, B3, or 
> B4.  The B's are actually variations on a particular analytic algorithm and 
> most of the time only one of them needs to be used.  On one day (based on 
> external knowledge) I click on A and B1 and then the Start arrow.  On another 
> day I modify the query, click on A and B2 and then click on the Start arrow.  
> etc, Clearly I could have four flows and I could start/stop entire flows.  
> But, as the number of processing stages increases and the number of 
> processing alternatives increases at each stage the combinatorial growth 
> makes distinct flows painful to manage.  Sometimes it is easier to have one 
> all encompassing flow and then allow the analyst to shift click the portions 
> they want to invoke for the next “run."
> 
> 
> Naz Irizarry
> MITRE Corp.
> 617-893-0074
> 
> 
> 
>> On Jan 12, 2017, at 2:14 PM, Joe Witt <[email protected]> wrote:
>> 
>> Naz
>> 
>> The green arrow vs red square says "scheduled to execute" vs "not
>> scheduled to execute".  For most processors, such as those which take
>> input flow files from a connection, even if they're scheduled to run
>> they're not going to be executed unless there is work to do (data
>> sitting in the queue) and space available (on all destination
>> relationships).  Because of this I'm suggesting to consider just
>> leaving them all scheduled to execute even though they won't actually
>> be doing anything most of the time.  The stats on each component tell
>> you how many times it was actually invoked and how much data it
>> processed, etc..  So you'll see that they're not doing anything most
>> of the time.
>> 
>> You mentioned not wanting to have to do anything manual yet run once
>> would be a manual construct, right?
>> 
>> I dont mean to suggest I'm closed off to the idea of a run once
>> concept I just really want to understand your use case better.
>> 
>> Thanks
>> Joe
>> 
>> On Thu, Jan 12, 2017 at 2:11 PM, Irizarry Jr., Nazario <[email protected]> 
>> wrote:
>>> Correction, that was the processor scheduler’s stopProcessor() method that 
>>> needs to be invoked so the UI shows that the processor is stopped.
>>> 
>>> Naz Irizarry
>>> MITRE Corp.
>>> 617-893-0074
>>> 
>>> 
>>> 
>>>> On Jan 12, 2017, at 2:08 PM, Irizarry Jr., Nazario <[email protected]> wrote:
>>>> 
>>>> Yes, we found that to keep the UI in sync (make sure it looks stopped 
>>>> after it runs once) the flow controller's stopProcessor() method has to be 
>>>> called.
>>>> 
>>>> Naz Irizarry
>>>> MITRE Corp.
>>>> 617-893-0074
>>>> 
>>>> 
>>>> 
>>>> On Jan 12, 2017, at 1:41 PM, Brandon DeVries 
>>>> <[email protected]<mailto:[email protected]>> wrote:
>>>> 
>>>> I think answering Joe's question is step one.  However, I was curious and
>>>> tried something:
>>>> 
>>>> public void onTrigger(...){
>>>> if(!isSheduled()){
>>>> return;
>>>> }
>>>> FlowFile flowFile = session.get()
>>>> if (flowFile == null){
>>>> return;
>>>> }
>>>> session.transfer(flowFile, REL_SUCCESS);
>>>> updateScheduledFalse();
>>>> }
>>>> 
>>>> This processes one FlowFile per "scheduling".  I.e., one FlowFile goes
>>>> through, and you need to stop / start to get another.  I'm not 100% that
>>>> nothing else would ever call the "built in" updateScheduled* methods, but
>>>> worst case the processor could maintain its own flag.  Also, for what it's
>>>> worth, calling updateScheduledFalse() doesn't "stop" the processor on the
>>>> graph... as Oleg mentions, this (or something like this) could potentially
>>>> be visually confusing.
>>>> 
>>>> I'm not sure how this fits in a production system, but this +
>>>> GenerateFlowFile and some backpressure seems possibly useful for
>>>> debugging.  I know I've faked this behavior with a GenerateFlowFile w/ run
>>>> schedule "1 day" or something before...  then again, maybe it would be best
>>>> to not create something that could be confusing / misused in a production
>>>> system.
>>>> 
>>>> Brandon
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jan 12, 2017 at 1:02 PM Joe Witt 
>>>> <[email protected]<mailto:[email protected]>> wrote:
>>>> 
>>>> Naz,
>>>> 
>>>> Why not just leave all the processes running?  If the data only
>>>> arrives periodically that is ok, right?
>>>> 
>>>> Thanks
>>>> Joe
>>>> 
>>>> On Thu, Jan 12, 2017 at 10:54 AM, Irizarry Jr., Nazario 
>>>> <[email protected]<mailto:[email protected]>>
>>>> wrote:
>>>> On a project that I am on we have been looking at using NiFi for
>>>> orchestrations that are invoked infrequently.  For example, once a month a
>>>> new data input product becomes available and then one wants to run it
>>>> through a set of processing steps that can be nicely implemented using NiFi
>>>> processors.  However, using the interval or cron scheduling for this
>>>> purpose begins to get cumbersome after a while with the need to start and
>>>> manually stop these occasional flows.
>>>> 
>>>> It would be fairly easy to add an additional scheduling option - “Run
>>>> Once” for this use case.  The behavior would be that when a processor is
>>>> set to run once it automatically stops after it has successfully processed
>>>> one input.
>>>> 
>>>> What do people think?  We are willing to implement this small
>>>> enhancement.
>>>> 
>>>> Cheers,
>>>> 
>>>> Naz Irizarry
>>>> MITRE Corp.
>>>> 617-893-0074 <(617)%20893-0074>
>>>> 
>>> 
>> 
>

Re: [DISCUSS] Run Once scheduling

Reply via email to