Re: Difference between setup() and activate()

Bhupesh Chawda Fri, 04 Aug 2017 20:09:06 -0700

There's one more reason why you might need to initialize your transients in
activate rather than setup. If you are working with pojos and depend on the
class of the a tuple via a configured port attribute (TUPLE_CLASS), then
this class will only be available in the activate method, and not in the
setup.
The order of calls is operator setup -> port setup (where the TUPLE_CLASS
attribute is populated) -> operator activate (class available to work with).


~ Bhupesh


On Aug 5, 2017 06:01, "Pramod Immaneni" <pra...@datatorrent.com> wrote:

> It may not end up making a big difference but it prevents any worker
> threads you may have from starting off ahead of processing resulting in
> unnecessary queue build up right from the start. This might still be the
> case later in the processing but if it can be avoided to start off on the
> wrong foot why not start the right way. Today the difference may be a few
> hundred milliseconds today but that is not a contract and the interval of
> execution between setup and activate could be higher in the future.
>
> Thanks
>
> On Fri, Aug 4, 2017 at 4:39 AM, Vlad Rozov <v.rozo...@gmail.com> wrote:
>
> > This recommendation to use activate() over setup() is questionable with
> > the introduction of the back pressure. In a distributed streaming
> > application operators need to handle downstream downtime, difference
> > between upstream and downstream throughput, busy output ports and back
> > pressure. A few hundreds milliseconds difference between setup() and
> > activate() is not something that I would be concerned as an operator
> > developer once the above conditions are handled.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 8/3/17 15:37, Pramod Immaneni wrote:
> >
> >> Yes activate is called closer to start of tuple processing as far as
> apex
> >> is concerned, so if you are doing things like writing an input operator
> >> that does asynchronous processing where you will start receiving data as
> >> soon as you open a connection to your external source it is better to do
> >> it
> >> in activate to reduce latency and buffer build up.
> >>
> >> On Thu, Aug 3, 2017 at 3:07 PM, Vlad Rozov <v.rozo...@gmail.com> wrote:
> >>
> >> Correct, both setup() and activate() are called when an operator is
> >>> restored from a checkpoint. When an operator is restored from a
> >>> checkpoint
> >>> it is considered to be a new instance/deployment of an operator with
> it's
> >>> state reset to a checkpoint. In this case Apex core gives an operator a
> >>> chance to initialize transient fields both in setup() or activate().
> >>>
> >>> I am not aware of any use case where platform will go through
> >>> activate/deactivate cycle without setup/teardown, but such code path
> may
> >>> be
> >>> introduced in the future (for example it may be used to manage an input
> >>> operator with high emit rate). It is better not to make any assumptions
> >>> on
> >>> how many times activate/deactivate may be called.
> >>>
> >>> Currently the main difference between setup() and activate() is
> described
> >>> in the java doc for ActivationListener:
> >>>
> >>> * An example of where one would consider implementing
> ActivationListener
> >>> is an * input operator which wants to consume a high throughput stream.
> >>> Since there is * typically at least a few hundreds of milliseconds
> >>> between
> >>> the time the setup method * is called and the first window, you would
> >>> want
> >>> to place the code to activate the * stream inside activate instead of
> >>> setup.
> >>>
> >>>
> >>> My recommendation is to use setup() to initialize transient fields
> unless
> >>> you need to deal with the above case.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/2/17 13:31, Ananth G wrote:
> >>>
> >>> Hello Vlad,
> >>>>
> >>>> Thanks for your response.
> >>>>
> >>>> Do you refer to restoring from a checkpoint as serialize/deserialize
> >>>>
> >>>>> cycles?
> >>>>>>
> >>>>>> Yes.
> >>>>>
> >>>> In case of restoring from a checkpoint (deserialization) setup() is a
> >>>>
> >>>>> part of a redeployment request, AFAIK.
> >>>>>>
> >>>>>> This sounds a bit in contradiction to the response from Sanjay in
> the
> >>>>>
> >>>> mail thread below. I tried to quickly glance in the apex-core code and
> >>>> it
> >>>> looks like both are being called ( Perhaps I am entirely wrong on this
> >>>> as
> >>>> it was only a quick scan). I was referring to the code in
> >>>> StreamingContainer.java in the engine package and the method called
> >>>> deploy().
> >>>>
> >>>>
> >>>> Please see ActivationListener javadoc for details when it is necessary
> >>>> to
> >>>>
> >>>>> use activate() vs setup().
> >>>>>>
> >>>>>> I had to raise this question in the mail after going through the
> >>>>>
> >>>> javadoc. The javadoc is a bit cryptic in this scenario of
> >>>> serialise/deserialize. Also the javadoc is not clear as to what we
> >>>> meant by
> >>>> activate/deactivate being called multiple times whereas setup is
> called
> >>>> once in a lifetime of the operator. If the setup is called once in
> >>>> lifetime
> >>>> of an operator per javadoc, did it mean once in the lifetime of the
> JVM
> >>>> instantiating via the constructor or across the deserialise cycles of
> >>>> the
> >>>> passivated operator state ? If it is once across all passivated
> >>>> instances
> >>>> of the operator, then setup() would not be called multiple times and
> >>>> hence
> >>>> not a great location for transient variables ? If setup() is called
> >>>> across
> >>>> deserialise cycles, then I find it more confusing as to why we need
> >>>> setup()
> >>>> and activate() methods almost having the same functionality.
> >>>>
> >>>> Thoughts ?
> >>>>
> >>>>
> >>>> Regards,
> >>>> Ananth
> >>>>
> >>>>
> >>>> On 1 Aug 2017, at 3:38 am, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
> >>>>
> >>>>> Do you refer to restoring from a checkpoint as serialize/deserialize
> >>>>> cycles? There are no calls to setup/teardown and/or
> activate/deactivate
> >>>>> during checkpointing/serialization. In case of restoring from a
> >>>>> checkpoint
> >>>>> (deserialization) setup() is a part of a redeployment request, AFAIK.
> >>>>> The
> >>>>> best answer to question 3 is it depends. In most cases using setup()
> to
> >>>>> resolve all transient field is as good as doing that in activate().
> >>>>> Please
> >>>>> see ActivationListener javadoc for details when it is necessary to
> use
> >>>>> activate() vs setup().
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Vlad
> >>>>>
> >>>>> On 7/29/17 19:58, Sanjay Pujare wrote:
> >>>>>
> >>>>> The Javadoc comment
> >>>>>> for com.datatorrent.api.Operator.ActivationListener<CONTEXT>  (in
> >>>>>> https://github.com/apache/apex-core/blob/master/api/src/main
> >>>>>> /java/com/datatorrent/api/Operator.java)
> >>>>>> should hopefully answer your questions.
> >>>>>>
> >>>>>> Specifically:
> >>>>>>
> >>>>>> 1. No, setup() is called only once in the entire lifetime (
> >>>>>> http://apex.apache.org/docs/apex/operator_development/#setup-call)
> >>>>>>
> >>>>>> 2. Yes. When an operator is "activated" - first time in its life or
> >>>>>> reactivation after a failover -  actuvate() is called before the
> first
> >>>>>> beginWindow() is called.
> >>>>>>
> >>>>>> 3. Yes.
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jul 30, 2017 at 12:18 AM, Ananth G <ananthg.a...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello All,
> >>>>>>
> >>>>>>> I was looking at the documentation and could not get a clear
> >>>>>>> distinction
> >>>>>>> of behaviours for setup() and activate() during scenarios when an
> >>>>>>> operator
> >>>>>>> is passivated ( ex: application shutdown, repartition use cases )
> and
> >>>>>>> being
> >>>>>>> brought back to life again. Could someone from the community advise
> >>>>>>> me
> >>>>>>> on
> >>>>>>> the following questions ?
> >>>>>>>
> >>>>>>> 1. Is setup() called in these scenarios (serialize/deserialize
> >>>>>>> cycles)
> >>>>>>> as
> >>>>>>> well ?
> >>>>>>>
> >>>>>>> 2. I am assuming activate() is called in these scenarios ? - The
> >>>>>>> javadoc
> >>>>>>> for activation states that the activate() can be called multiple
> >>>>>>> times
> >>>>>>> (
> >>>>>>> without explicitly stating why ) and my assumption is that it is
> >>>>>>> because of
> >>>>>>> these scenarios.
> >>>>>>>
> >>>>>>> 3. If setup() is only called once during the lifetime of an
> operator
> >>>>>>> ,
> >>>>>>> is
> >>>>>>> it fair to assume that activate() is the best place to resolve all
> of
> >>>>>>> the
> >>>>>>> transient fields of an operator ?
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Ananth
> >>>>>>>
> >>>>>>>
> >
>

Re: Difference between setup() and activate()

Reply via email to