If we're willing to have a LoopFlowFile processor, why not consider a
PenalizeFlowFile processor too?  Just throwing it out for discussions sake,
but penalization could ultimately be realized in multiple ways:

a) by both the processor developer (and DFM via penalty duration), as it is
done today;
b) by the DFM as part of the Connection Settings, per Mark's proposal;
c) by the DFM as part of a (alternative) standard processor, with various
(and future) penalization options configured as Processor Properties
d) all the above

The line is blurry between what functionality can/should go into the
connection or queue vs. which functionality can/should go into a
processor.  If we're willing to say that LoopFlowFile should be defined as
a processor, I don't see much difference between "loop for 10 minutes" vs.
"penalize for 10 minutes" (beyond the obvious).

As a general statement, I think it's good to minimize the various ways in
which processors, queues and relationships are managed.  Today we have
configuration options for:

 - queues (expiration, prioritizers, back pressure)
-  settings tab of every processor (scheduling strategy, penalty duration,
run schedule)
-  specific settings to the processor itself (processor properties).

Flowfile expiration is handled on the queue, while penalization is
configured on the processor.  At the end of the day, whatever reduces the
number places a DFM has to touch is a good thing.

Perhaps as a radical proposal, why don't we add some @Experimental
processors which do things like PenalizeFlowFile, LoopFlowfile,
DelayFlowFile, PrioritizeFlow, and see what the experience is like using
these vs. using the existing functions.  If the community thinks there's
too much overlap, we can remove these from the 1.0 release.  But at least
we'll get some A/B testing by having these queue management services
realized as processors vs. built into other object types.  Maybe put these
into a nifi-queue-management.nar extension for people to play with?

Just food for thought.

Adam



On Thu, Jan 28, 2016 at 2:36 PM, Mark Payne <marka...@hotmail.com> wrote:

> I think for the particular pattern, I would like to see a LoopFlowFile
> processor (or something with a better name perhaps :) )
> that would allow the user to just set a threshold for how many times to
> try or how long to keep trying or both and then
> send to either a 'threshold exceeded' or 'below threshold' relationship.
> I.e., set a threshold of 3 times or 10 minutes and
> then route to one or the other. It would make that pattern a lot easier by
> just using a single easy-to-understand Processor.
>
>
>
> > On Jan 28, 2016, at 2:31 PM, Ricky Saltzer <ri...@cloudera.com> wrote:
> >
> > That's a good point, Mark. I also agree that it's better to give the user
> > control whenever possible. I imagine the RouteOnAttribute pattern to
> > eventually "give up" on a FlowFile will be a common pattern, and so so we
> > should account for that, rather than forcing the user into knowing this
> > pattern.
> >
> > On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne <marka...@hotmail.com>
> wrote:
> >
> >>
> >> The retry idea concerns me a bit. If we were to have a method like:
> >>
> >> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship
> >> relationship)
> >>
> >> I think that leaves out some info - even if a FlowFile is
> >> penalized, it must be penalized and sent somewhere. So there would have
> to
> >> be
> >> a relationship to send it to if penalized and another to send it to if
> not
> >> penalizing.
> >> This also I think puts more onus on the developer to understand how it
> >> would be
> >> used - I believe the user should be making decisions about how many
> times
> >> to
> >> penalize, not the developer.
> >>
> >>> On Jan 28, 2016, at 2:03 PM, Bryan Bende <bbe...@gmail.com> wrote:
> >>>
> >>> Regarding throwing an exception... I believe if you are extending
> >>> AbstractProcessor and an exception is thrown out of onTrigger() then
> the
> >>> session is rolled back and any flow files that were accessed are
> >> penalized,
> >>> which results in leaving them in the incoming connection to the
> processor
> >>> and not being retried until the penalty duration passes. This seems
> >> similar
> >>> to what Michael described, although it is not stopping the processor
> from
> >>> processing other incoming  flow files.
> >>>
> >>> Ricky's retry idea sounds interesting... I think a lot of people handle
> >>> this today by creating a retry loop using UpdateAttribute and
> >>> RouteOnAttribute [1].
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2
> >>>
> >>>
> >>> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com>
> >> wrote:
> >>>
> >>>> Is there currently a way to know how many times a FlowFile has been
> >>>> penalized? Do we have use cases where we want to penalize a FlowFile
> *n
> >>>> *number
> >>>> of times before sending it down an alternate relationship? I could
> >> imagine
> >>>> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> >>>> Relationship relationship). For example, someone might want to
> process a
> >>>> FlowFile three times before giving up on it.
> >>>>
> >>>> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> >>>> mdecou...@googlemail.com> wrote:
> >>>>
> >>>>> Matt thanks for your reply
> >>>>>
> >>>>> I guess what I am saying in that case - if there is an error in a
> >>>>> FlowFile, then the processor that detects this cannot proceed so
> >> instead
> >>>> of
> >>>>> calling an action to penalize the FlowFile it raises an exception
> >>>>> OutOFServiceException or ProcessorException.
> >>>>> You could have an exception cause PeanilisedFlowFileException for
> this
> >>>>> case.
> >>>>>
> >>>>> But within the processor other error causes may arise for an
> >>>>> OutOFServiceException
> >>>>>
> >>>>> The point is that if the processor threw this exception then there
> can
> >> be
> >>>>> a duration configuration - a time limit to keep this processor out of
> >>>>> service and the connection to it and possibly any processors leading
> >> upto
> >>>>> it - Naturally this will need to be indicated on the DFM - this will
> >> free
> >>>>> resources and make the flow well behaved.
> >>>>>
> >>>>> Environmental failures will simply be a different category/cause of
> >> error
> >>>>> that can be wrapped/captured also with a more general one
> >>>>>
> >>>>> With Kind Regards
> >>>>> Michael de Courci
> >>>>> mdecou...@gmail.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Just to recap/level set...
> >>>>>>
> >>>>>> The distinct between yielding and penalization is important.
> >>>> Penalization
> >>>>>> is an action taken on a FlowFile because the FlowFile cannot be
> >>>> processed
> >>>>>> right now (like a naming conflict for instance). The Processor is
> >>>>>> indicating that it cannot process that specific FlowFile at the
> moment
> >>>>> but
> >>>>>> may be able to process the next. Yielding is an indication that the
> >>>>>> Processor is unable to work at all at the moment likely due to an
> >>>>>> environmental issue (like the out of service comment).
> >>>>>>
> >>>>>> If the concept of penalization were moved to a connection, does it
> >>>>>> automatically penalize all FlowFile transferred to it? We would lose
> >>>> some
> >>>>>> granularity if a Processor wanted to penalize some FlowFile routed
> to
> >> a
> >>>>>> given Relationship but not others. I'm not sure if this is done in
> >>>>> practice
> >>>>>> or not, just wanted to mention it.
> >>>>>>
> >>>>>> Outside of this minor concern, I like the idea. I especially like
> that
> >>>> it
> >>>>>> would help with the consistency of Processor behavior and
> transparency
> >>>>>> about what the data flow is actually doing.
> >>>>>>
> >>>>>> Matt
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> >>>>>> mdecou...@googlemail.com> wrote:
> >>>>>>
> >>>>>>> Hi
> >>>>>>> I think it would be better/simpler to have one “out of service”
> >>>> concept
> >>>>>>> to replace penalizing and yielding and when a plugin throws an
> >>>> exception
> >>>>>>> then the plugin is deemed out of service, for a duration and so the
> >>>>>>> connection to that plugin is disabled for the out of service
> >> duration.
> >>>>>>>
> >>>>>>> When a plugin is out of service and the connection disabled - then
> >>>>>>> resources that it uses will be freed(yielded).
> >>>>>>>
> >>>>>>> The question then is what the behaviour of the plugin before the
> >>>>> disabled
> >>>>>>> connection - should be.  My thought is to tend towards stability
> and
> >>>>> make
> >>>>>>> sure resources are freed, so there may need to be a “domino
> >>>>> effect”/cascade
> >>>>>>> affect where all plugins before are gradually put out of service.
> >>>>>>>
> >>>>>>>
> >>>>>>> With Kind Regards
> >>>>>>> Michael de Courci
> >>>>>>> mdecou...@gmail.com
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> All,
> >>>>>>>>
> >>>>>>>> I've been thinking about how we handle the concept of penalizing
> >>>>>>> FlowFiles. We've had a lot of questions
> >>>>>>>> lately about how penalization works & the concept in general.
> Seems
> >>>> the
> >>>>>>> following problems exist:
> >>>>>>>>
> >>>>>>>> - Confusion about difference between penalization & yielding
> >>>>>>>> - DFM sees option to configure penalization period on all
> >> processors,
> >>>>>>> even if they don't penalize FlowFiles.
> >>>>>>>> - DFM cannot set penalty duration in 1 case and set a different
> >> value
> >>>>>>> for a different case (different relationship, for example).
> >>>>>>>> - Developers often forget to call penalize()
> >>>>>>>> - Developer has to determine whether or not to penalize when
> >>>> building a
> >>>>>>> processor. It is based on what the developer will
> >>>>>>>> think may make sense, but in reality DFM's sometimes want to
> >> penalize
> >>>>>>> things when the processor doesn't behave that way.
> >>>>>>>>
> >>>>>>>> I'm wondering if it doesn't make sense to remove the concept of
> >>>>>>> penalization all together from Processors and instead
> >>>>>>>> move the Penalty Duration so that it's a setting on the
> Connection.
> >> I
> >>>>>>> think this would clear up the confusion and give the DFM
> >>>>>>>> more control over when/how long to penalize. Could set to the
> >> default
> >>>>> to
> >>>>>>> 30 seconds for self-looping connections and no penalization
> >>>>>>>> for other connections.
> >>>>>>>>
> >>>>>>>> Any thoughts?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> -Mark
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Ricky Saltzer
> >>>> http://www.cloudera.com
> >>>>
> >>
> >>
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
>
>

Reply via email to