There's one more reason why you might need to initialize your transients in activate rather than setup. If you are working with pojos and depend on the class of the a tuple via a configured port attribute (TUPLE_CLASS), then this class will only be available in the activate method, and not in the setup. The order of calls is operator setup -> port setup (where the TUPLE_CLASS attribute is populated) -> operator activate (class available to work with).
~ Bhupesh On Aug 5, 2017 06:01, "Pramod Immaneni" <pra...@datatorrent.com> wrote: > It may not end up making a big difference but it prevents any worker > threads you may have from starting off ahead of processing resulting in > unnecessary queue build up right from the start. This might still be the > case later in the processing but if it can be avoided to start off on the > wrong foot why not start the right way. Today the difference may be a few > hundred milliseconds today but that is not a contract and the interval of > execution between setup and activate could be higher in the future. > > Thanks > > On Fri, Aug 4, 2017 at 4:39 AM, Vlad Rozov <v.rozo...@gmail.com> wrote: > > > This recommendation to use activate() over setup() is questionable with > > the introduction of the back pressure. In a distributed streaming > > application operators need to handle downstream downtime, difference > > between upstream and downstream throughput, busy output ports and back > > pressure. A few hundreds milliseconds difference between setup() and > > activate() is not something that I would be concerned as an operator > > developer once the above conditions are handled. > > > > Thank you, > > > > Vlad > > > > > > On 8/3/17 15:37, Pramod Immaneni wrote: > > > >> Yes activate is called closer to start of tuple processing as far as > apex > >> is concerned, so if you are doing things like writing an input operator > >> that does asynchronous processing where you will start receiving data as > >> soon as you open a connection to your external source it is better to do > >> it > >> in activate to reduce latency and buffer build up. > >> > >> On Thu, Aug 3, 2017 at 3:07 PM, Vlad Rozov <v.rozo...@gmail.com> wrote: > >> > >> Correct, both setup() and activate() are called when an operator is > >>> restored from a checkpoint. When an operator is restored from a > >>> checkpoint > >>> it is considered to be a new instance/deployment of an operator with > it's > >>> state reset to a checkpoint. In this case Apex core gives an operator a > >>> chance to initialize transient fields both in setup() or activate(). > >>> > >>> I am not aware of any use case where platform will go through > >>> activate/deactivate cycle without setup/teardown, but such code path > may > >>> be > >>> introduced in the future (for example it may be used to manage an input > >>> operator with high emit rate). It is better not to make any assumptions > >>> on > >>> how many times activate/deactivate may be called. > >>> > >>> Currently the main difference between setup() and activate() is > described > >>> in the java doc for ActivationListener: > >>> > >>> * An example of where one would consider implementing > ActivationListener > >>> is an * input operator which wants to consume a high throughput stream. > >>> Since there is * typically at least a few hundreds of milliseconds > >>> between > >>> the time the setup method * is called and the first window, you would > >>> want > >>> to place the code to activate the * stream inside activate instead of > >>> setup. > >>> > >>> > >>> My recommendation is to use setup() to initialize transient fields > unless > >>> you need to deal with the above case. > >>> > >>> Thank you, > >>> > >>> Vlad > >>> > >>> > >>> On 8/2/17 13:31, Ananth G wrote: > >>> > >>> Hello Vlad, > >>>> > >>>> Thanks for your response. > >>>> > >>>> Do you refer to restoring from a checkpoint as serialize/deserialize > >>>> > >>>>> cycles? > >>>>>> > >>>>>> Yes. > >>>>> > >>>> In case of restoring from a checkpoint (deserialization) setup() is a > >>>> > >>>>> part of a redeployment request, AFAIK. > >>>>>> > >>>>>> This sounds a bit in contradiction to the response from Sanjay in > the > >>>>> > >>>> mail thread below. I tried to quickly glance in the apex-core code and > >>>> it > >>>> looks like both are being called ( Perhaps I am entirely wrong on this > >>>> as > >>>> it was only a quick scan). I was referring to the code in > >>>> StreamingContainer.java in the engine package and the method called > >>>> deploy(). > >>>> > >>>> > >>>> Please see ActivationListener javadoc for details when it is necessary > >>>> to > >>>> > >>>>> use activate() vs setup(). > >>>>>> > >>>>>> I had to raise this question in the mail after going through the > >>>>> > >>>> javadoc. The javadoc is a bit cryptic in this scenario of > >>>> serialise/deserialize. Also the javadoc is not clear as to what we > >>>> meant by > >>>> activate/deactivate being called multiple times whereas setup is > called > >>>> once in a lifetime of the operator. If the setup is called once in > >>>> lifetime > >>>> of an operator per javadoc, did it mean once in the lifetime of the > JVM > >>>> instantiating via the constructor or across the deserialise cycles of > >>>> the > >>>> passivated operator state ? If it is once across all passivated > >>>> instances > >>>> of the operator, then setup() would not be called multiple times and > >>>> hence > >>>> not a great location for transient variables ? If setup() is called > >>>> across > >>>> deserialise cycles, then I find it more confusing as to why we need > >>>> setup() > >>>> and activate() methods almost having the same functionality. > >>>> > >>>> Thoughts ? > >>>> > >>>> > >>>> Regards, > >>>> Ananth > >>>> > >>>> > >>>> On 1 Aug 2017, at 3:38 am, Vlad Rozov <v.ro...@datatorrent.com> > wrote: > >>>> > >>>>> Do you refer to restoring from a checkpoint as serialize/deserialize > >>>>> cycles? There are no calls to setup/teardown and/or > activate/deactivate > >>>>> during checkpointing/serialization. In case of restoring from a > >>>>> checkpoint > >>>>> (deserialization) setup() is a part of a redeployment request, AFAIK. > >>>>> The > >>>>> best answer to question 3 is it depends. In most cases using setup() > to > >>>>> resolve all transient field is as good as doing that in activate(). > >>>>> Please > >>>>> see ActivationListener javadoc for details when it is necessary to > use > >>>>> activate() vs setup(). > >>>>> > >>>>> Thank you, > >>>>> > >>>>> Vlad > >>>>> > >>>>> On 7/29/17 19:58, Sanjay Pujare wrote: > >>>>> > >>>>> The Javadoc comment > >>>>>> for com.datatorrent.api.Operator.ActivationListener<CONTEXT> (in > >>>>>> https://github.com/apache/apex-core/blob/master/api/src/main > >>>>>> /java/com/datatorrent/api/Operator.java) > >>>>>> should hopefully answer your questions. > >>>>>> > >>>>>> Specifically: > >>>>>> > >>>>>> 1. No, setup() is called only once in the entire lifetime ( > >>>>>> http://apex.apache.org/docs/apex/operator_development/#setup-call) > >>>>>> > >>>>>> 2. Yes. When an operator is "activated" - first time in its life or > >>>>>> reactivation after a failover - actuvate() is called before the > first > >>>>>> beginWindow() is called. > >>>>>> > >>>>>> 3. Yes. > >>>>>> > >>>>>> > >>>>>> On Sun, Jul 30, 2017 at 12:18 AM, Ananth G <ananthg.a...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> Hello All, > >>>>>> > >>>>>>> I was looking at the documentation and could not get a clear > >>>>>>> distinction > >>>>>>> of behaviours for setup() and activate() during scenarios when an > >>>>>>> operator > >>>>>>> is passivated ( ex: application shutdown, repartition use cases ) > and > >>>>>>> being > >>>>>>> brought back to life again. Could someone from the community advise > >>>>>>> me > >>>>>>> on > >>>>>>> the following questions ? > >>>>>>> > >>>>>>> 1. Is setup() called in these scenarios (serialize/deserialize > >>>>>>> cycles) > >>>>>>> as > >>>>>>> well ? > >>>>>>> > >>>>>>> 2. I am assuming activate() is called in these scenarios ? - The > >>>>>>> javadoc > >>>>>>> for activation states that the activate() can be called multiple > >>>>>>> times > >>>>>>> ( > >>>>>>> without explicitly stating why ) and my assumption is that it is > >>>>>>> because of > >>>>>>> these scenarios. > >>>>>>> > >>>>>>> 3. If setup() is only called once during the lifetime of an > operator > >>>>>>> , > >>>>>>> is > >>>>>>> it fair to assume that activate() is the best place to resolve all > of > >>>>>>> the > >>>>>>> transient fields of an operator ? > >>>>>>> > >>>>>>> > >>>>>>> Regards, > >>>>>>> Ananth > >>>>>>> > >>>>>>> > > >