Which runner's bundling are you concerned with? It sounds like the Flink runner?
Kenn On Fri, Feb 16, 2018 at 9:04 AM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > > 2018-02-16 17:59 GMT+01:00 Kenneth Knowles <k...@google.com>: > >> What I am hearing is this: >> >> - @FinishBundle does what you want (a reliable "flush" call) but your >> runner is not doing a good job of bundling >> > > Nop, finishbundle is defined but not a bundle. Typically for 1 million > rows I'll get 1 million calls in flink and 1 call in spark (today) so this > is not a way to call a final task to release dofn internal instances or do > some one time auditing. > > >> - @Teardown has well-defined semantics and they are not what you want >> > > " > Note that calls to the annotated method are best effort, and may not occur > for arbitrary reasons" > > is not really "well-defined" and is also a breaking change compared to the > < 2.3.x (x >= 1) . > > >> So you are hoping for something that is called less frequently but is >> still mandatory. >> >> Just trying to establish the basics to start over and get this on track >> to solving the real problem. >> > > Concretely I need a well defined lifecycle for any DoFn executed in beam > and today there is no such a thing making it impossible to develop > correctly transforms/fn on an user side. > > >> >> Kenn >> >> >> On Fri, Feb 16, 2018 at 8:51 AM, Romain Manni-Bucau < >> rmannibu...@gmail.com> wrote: >> >>> finish bundle is well defined and must be called, right, not at the end >>> so you still miss teardown as a user. Bundles are defined by the runner and >>> you can have 100000 bundles per batch (even more for a stream ;)) so you >>> dont want to release your resources or handle you execution auditing in it, >>> you want it at the end so in tear down. >>> >>> So yes we must have teardown reliable somehow. >>> >>> >>> Romain Manni-Bucau >>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>> <https://rmannibucau.metawerx.net/> | Old Blog >>> <http://rmannibucau.wordpress.com> | Github >>> <https://github.com/rmannibucau> | LinkedIn >>> <https://www.linkedin.com/in/rmannibucau> | Book >>> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >>> >>> 2018-02-16 17:43 GMT+01:00 Reuven Lax <re...@google.com>: >>> >>>> +1 I think @FinishBundle is the right thing to look at here. >>>> >>>> On Fri, Feb 16, 2018, 8:41 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >>>>> Hi Romain >>>>> >>>>> Is it not @FinishBundle your solution ? >>>>> >>>>> Regards >>>>> JB >>>>> Le 16 févr. 2018, à 17:06, Romain Manni-Bucau <rmannibu...@gmail.com> >>>>> a écrit: >>>>>> >>>>>> I see Reuven, so it is actually a broken contract for end users more >>>>>> than a bug. Concretely a user must have a way to execute code once the >>>>>> teardown is no more used and a teardown is populated by the user in the >>>>>> context of an execution. >>>>>> It means that if the environment wants to pool (cache) the instances >>>>>> it must provide a postBorrowFromCache and preReturnToCache to let the >>>>>> user >>>>>> handle that - we'll get back to EJB and passivation ;). >>>>>> >>>>>> Personally I think it is fine to cache the instances for the duration >>>>>> of an execution but not accross execution. Concretely if you check out >>>>>> the >>>>>> API it should just not be possible for a runner since the lifecycle is >>>>>> not >>>>>> covered and the fact teardown can not be called today is an >>>>>> implementation >>>>>> bug/leak surfacing the API. >>>>>> >>>>>> So I see 2 options: >>>>>> >>>>>> 1. make it mandatory and get rid of the caching - which shouldnt help >>>>>> much in current state in terms of perf >>>>>> 2. keep teardown a final release object (which is not that useful >>>>>> cause of the end of the sentence) and add a clean cache lifecycle >>>>>> management >>>>>> >>>>>> tempted to say 1 is saner short terms, in particular cause beam is >>>>>> 2.x and users already use it this way. >>>>>> >>>>>> >>>>>> >>>>>> Romain Manni-Bucau >>>>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>>> <http://rmannibucau.wordpress.com> | Github >>>>>> <https://github.com/rmannibucau> | LinkedIn >>>>>> <https://www.linkedin.com/in/rmannibucau> | Book >>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >>>>>> >>>>>> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>: >>>>>> >>>>>>> So the concern is that @TearDown might not be called? >>>>>>> >>>>>>> Let's understand the reason for @TearDown. The runner is free to >>>>>>> cache the DoFn object across many invocations, and indeed in streaming >>>>>>> this >>>>>>> is often a critical optimization. However if the runner does decide to >>>>>>> destroy the DoFn object (e.g. because it's being evicted from cache), >>>>>>> often >>>>>>> users need a callback to tear down associated resources (file handles, >>>>>>> RPC >>>>>>> connections, etc.). >>>>>>> >>>>>>> Now @TearDown isn't guaranteed to be called for a simple reason: the >>>>>>> runner might never tear down the DoFn object! The runner might well >>>>>>> decide >>>>>>> to cache the object forever, in which case there is never .a time to >>>>>>> call >>>>>>> @TearDown. There is no violation of semantics here. >>>>>>> >>>>>>> Also, the point about not calling teardown if the JVM crashes might >>>>>>> well sound implicit with no need to mention it. However empirically >>>>>>> users >>>>>>> do misunderstand even this, so it's worth mentioning. >>>>>>> >>>>>>> Reuven >>>>>>> >>>>>>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau < >>>>>>> rmannibu...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I'm a bit concerned of this PR https://github.com/apache/beam >>>>>>>> /pull/4637 >>>>>>>> >>>>>>>> I understand the intent but I'd like to share how I see it and why >>>>>>>> it is an issue for me: >>>>>>>> >>>>>>>> 1. you can't help if the JVM crash in any case. Tomcat had a try to >>>>>>>> preallocate some memory for instance to free it in case of OOME and >>>>>>>> try to >>>>>>>> recover but it never prooved useful and got dropped recently. This is a >>>>>>>> good example you can't do anything if there is a cataclism and >>>>>>>> therefore >>>>>>>> any framework or lib will not be blamed for it >>>>>>>> 2. if you expose an API, its behavior must be well defined. In the >>>>>>>> case of a portable library like Beam it is even more important >>>>>>>> otherwise it >>>>>>>> leads users to not use the API or the projet :(. >>>>>>>> >>>>>>>> >>>>>>>> These two points lead to say that if the JVM crashes it is ok to >>>>>>>> not call teardown and it is even implicit in any programming >>>>>>>> environment so >>>>>>>> no need to mention it. However that a runner doesn't call teardown is >>>>>>>> a bug >>>>>>>> and not a feature or something intended because it can have a huge >>>>>>>> impact >>>>>>>> on the user flow. >>>>>>>> >>>>>>>> The user workarounds are to use custom threads with timeouts to >>>>>>>> execute the actions or things like that, all bad solutions to replace a >>>>>>>> buggy API, if you remove the contract guarantee. >>>>>>>> >>>>>>>> To make it obvious: substring(from, to): will substring the current >>>>>>>> string between from and to...or not. Would you use the function? >>>>>>>> >>>>>>>> What I ask is to add in the javadoc that the contract enforces the >>>>>>>> runner to call that. Which means the core doesn't guarantee it but >>>>>>>> imposes >>>>>>>> the runner to do so. This way the not portable behavior is where it >>>>>>>> belongs >>>>>>>> to, in the vendor specific code. It leads to a reliable API for the end >>>>>>>> user and let runners document they don't respect - yet - the API when >>>>>>>> relevant. >>>>>>>> >>>>>>>> wdyt? >>>>>>>> >>>>>>>> Romain Manni-Bucau >>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>>>>> <http://rmannibucau.wordpress.com> | Github >>>>>>>> <https://github.com/rmannibucau> | LinkedIn >>>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book >>>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>> >> >