There's one more reason why you might need to initialize your transients in
activate rather than setup. If you are working with pojos and depend on the
class of the a tuple via a configured port attribute (TUPLE_CLASS), then
this class will only be available in the activate method, and not in the
setup.
The order of calls is operator setup -> port setup (where the TUPLE_CLASS
attribute is populated) -> operator activate (class available to work with).

~ Bhupesh


On Aug 5, 2017 06:01, "Pramod Immaneni" <pra...@datatorrent.com> wrote:

> It may not end up making a big difference but it prevents any worker
> threads you may have from starting off ahead of processing resulting in
> unnecessary queue build up right from the start. This might still be the
> case later in the processing but if it can be avoided to start off on the
> wrong foot why not start the right way. Today the difference may be a few
> hundred milliseconds today but that is not a contract and the interval of
> execution between setup and activate could be higher in the future.
>
> Thanks
>
> On Fri, Aug 4, 2017 at 4:39 AM, Vlad Rozov <v.rozo...@gmail.com> wrote:
>
> > This recommendation to use activate() over setup() is questionable with
> > the introduction of the back pressure. In a distributed streaming
> > application operators need to handle downstream downtime, difference
> > between upstream and downstream throughput, busy output ports and back
> > pressure. A few hundreds milliseconds difference between setup() and
> > activate() is not something that I would be concerned as an operator
> > developer once the above conditions are handled.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 8/3/17 15:37, Pramod Immaneni wrote:
> >
> >> Yes activate is called closer to start of tuple processing as far as
> apex
> >> is concerned, so if you are doing things like writing an input operator
> >> that does asynchronous processing where you will start receiving data as
> >> soon as you open a connection to your external source it is better to do
> >> it
> >> in activate to reduce latency and buffer build up.
> >>
> >> On Thu, Aug 3, 2017 at 3:07 PM, Vlad Rozov <v.rozo...@gmail.com> wrote:
> >>
> >> Correct, both setup() and activate() are called when an operator is
> >>> restored from a checkpoint. When an operator is restored from a
> >>> checkpoint
> >>> it is considered to be a new instance/deployment of an operator with
> it's
> >>> state reset to a checkpoint. In this case Apex core gives an operator a
> >>> chance to initialize transient fields both in setup() or activate().
> >>>
> >>> I am not aware of any use case where platform will go through
> >>> activate/deactivate cycle without setup/teardown, but such code path
> may
> >>> be
> >>> introduced in the future (for example it may be used to manage an input
> >>> operator with high emit rate). It is better not to make any assumptions
> >>> on
> >>> how many times activate/deactivate may be called.
> >>>
> >>> Currently the main difference between setup() and activate() is
> described
> >>> in the java doc for ActivationListener:
> >>>
> >>> * An example of where one would consider implementing
> ActivationListener
> >>> is an * input operator which wants to consume a high throughput stream.
> >>> Since there is * typically at least a few hundreds of milliseconds
> >>> between
> >>> the time the setup method * is called and the first window, you would
> >>> want
> >>> to place the code to activate the * stream inside activate instead of
> >>> setup.
> >>>
> >>>
> >>> My recommendation is to use setup() to initialize transient fields
> unless
> >>> you need to deal with the above case.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/2/17 13:31, Ananth G wrote:
> >>>
> >>> Hello Vlad,
> >>>>
> >>>> Thanks for your response.
> >>>>
> >>>> Do you refer to restoring from a checkpoint as serialize/deserialize
> >>>>
> >>>>> cycles?
> >>>>>>
> >>>>>> Yes.
> >>>>>
> >>>> In case of restoring from a checkpoint (deserialization) setup() is a
> >>>>
> >>>>> part of a redeployment request, AFAIK.
> >>>>>>
> >>>>>> This sounds a bit in contradiction to the response from Sanjay in
> the
> >>>>>
> >>>> mail thread below. I tried to quickly glance in the apex-core code and
> >>>> it
> >>>> looks like both are being called ( Perhaps I am entirely wrong on this
> >>>> as
> >>>> it was only a quick scan). I was referring to the code in
> >>>> StreamingContainer.java in the engine package and the method called
> >>>> deploy().
> >>>>
> >>>>
> >>>> Please see ActivationListener javadoc for details when it is necessary
> >>>> to
> >>>>
> >>>>> use activate() vs setup().
> >>>>>>
> >>>>>> I had to raise this question in the mail after going through the
> >>>>>
> >>>> javadoc. The javadoc is a bit cryptic in this scenario of
> >>>> serialise/deserialize. Also the javadoc is not clear as to what we
> >>>> meant by
> >>>> activate/deactivate being called multiple times whereas setup is
> called
> >>>> once in a lifetime of the operator. If the setup is called once in
> >>>> lifetime
> >>>> of an operator per javadoc, did it mean once in the lifetime of the
> JVM
> >>>> instantiating via the constructor or across the deserialise cycles of
> >>>> the
> >>>> passivated operator state ? If it is once across all passivated
> >>>> instances
> >>>> of the operator, then setup() would not be called multiple times and
> >>>> hence
> >>>> not a great location for transient variables ? If setup() is called
> >>>> across
> >>>> deserialise cycles, then I find it more confusing as to why we need
> >>>> setup()
> >>>> and activate() methods almost having the same functionality.
> >>>>
> >>>> Thoughts ?
> >>>>
> >>>>
> >>>> Regards,
> >>>> Ananth
> >>>>
> >>>>
> >>>> On 1 Aug 2017, at 3:38 am, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
> >>>>
> >>>>> Do you refer to restoring from a checkpoint as serialize/deserialize
> >>>>> cycles? There are no calls to setup/teardown and/or
> activate/deactivate
> >>>>> during checkpointing/serialization. In case of restoring from a
> >>>>> checkpoint
> >>>>> (deserialization) setup() is a part of a redeployment request, AFAIK.
> >>>>> The
> >>>>> best answer to question 3 is it depends. In most cases using setup()
> to
> >>>>> resolve all transient field is as good as doing that in activate().
> >>>>> Please
> >>>>> see ActivationListener javadoc for details when it is necessary to
> use
> >>>>> activate() vs setup().
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Vlad
> >>>>>
> >>>>> On 7/29/17 19:58, Sanjay Pujare wrote:
> >>>>>
> >>>>> The Javadoc comment
> >>>>>> for com.datatorrent.api.Operator.ActivationListener<CONTEXT>  (in
> >>>>>> https://github.com/apache/apex-core/blob/master/api/src/main
> >>>>>> /java/com/datatorrent/api/Operator.java)
> >>>>>> should hopefully answer your questions.
> >>>>>>
> >>>>>> Specifically:
> >>>>>>
> >>>>>> 1. No, setup() is called only once in the entire lifetime (
> >>>>>> http://apex.apache.org/docs/apex/operator_development/#setup-call)
> >>>>>>
> >>>>>> 2. Yes. When an operator is "activated" - first time in its life or
> >>>>>> reactivation after a failover -  actuvate() is called before the
> first
> >>>>>> beginWindow() is called.
> >>>>>>
> >>>>>> 3. Yes.
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jul 30, 2017 at 12:18 AM, Ananth G <ananthg.a...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello All,
> >>>>>>
> >>>>>>> I was looking at the documentation and could not get a clear
> >>>>>>> distinction
> >>>>>>> of behaviours for setup() and activate() during scenarios when an
> >>>>>>> operator
> >>>>>>> is passivated ( ex: application shutdown, repartition use cases )
> and
> >>>>>>> being
> >>>>>>> brought back to life again. Could someone from the community advise
> >>>>>>> me
> >>>>>>> on
> >>>>>>> the following questions ?
> >>>>>>>
> >>>>>>> 1. Is setup() called in these scenarios (serialize/deserialize
> >>>>>>> cycles)
> >>>>>>> as
> >>>>>>> well ?
> >>>>>>>
> >>>>>>> 2. I am assuming activate() is called in these scenarios ? - The
> >>>>>>> javadoc
> >>>>>>> for activation states that the activate() can be called multiple
> >>>>>>> times
> >>>>>>> (
> >>>>>>> without explicitly stating why ) and my assumption is that it is
> >>>>>>> because of
> >>>>>>> these scenarios.
> >>>>>>>
> >>>>>>> 3. If setup() is only called once during the lifetime of an
> operator
> >>>>>>> ,
> >>>>>>> is
> >>>>>>> it fair to assume that activate() is the best place to resolve all
> of
> >>>>>>> the
> >>>>>>> transient fields of an operator ?
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Ananth
> >>>>>>>
> >>>>>>>
> >
>

Reply via email to