Re: @TearDown guarantees

Reuven Lax Fri, 16 Feb 2018 08:58:44 -0800

@TearDown refers to DoFn teardown not process teardown (it's basically a
destructor). So it's also runner defined.


There may be a place for a container that lives as long as the process (not
tied to the DoFn life). However that would be something new to add.

On Fri, Feb 16, 2018, 8:52 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> finish bundle is well defined and must be called, right, not at the end so
> you still miss teardown as a user. Bundles are defined by the runner and
> you can have 100000 bundles per batch (even more for a stream ;)) so you
> dont want to release your resources or handle you execution auditing in it,
> you want it at the end so in tear down.
>
> So yes we must have teardown reliable somehow.
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-02-16 17:43 GMT+01:00 Reuven Lax <re...@google.com>:
>
>> +1 I think @FinishBundle is the right thing to look at here.
>>
>> On Fri, Feb 16, 2018, 8:41 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Romain
>>>
>>> Is it not @FinishBundle your solution ?
>>>
>>> Regards
>>> JB
>>> Le 16 févr. 2018, à 17:06, Romain Manni-Bucau <rmannibu...@gmail.com> a
>>> écrit:
>>>>
>>>> I see Reuven, so it is actually a broken contract for end users more
>>>> than a bug. Concretely a user must have a way to execute code once the
>>>> teardown is no more used and a teardown is populated by the user in the
>>>> context of an execution.
>>>> It means that if the environment wants to pool (cache) the instances it
>>>> must provide a postBorrowFromCache and preReturnToCache to let the user
>>>> handle that - we'll get back to EJB and passivation ;).
>>>>
>>>> Personally I think it is fine to cache the instances for the duration
>>>> of an execution but not accross execution. Concretely if you check out the
>>>> API it should just not be possible for a runner since the lifecycle is not
>>>> covered and the fact teardown can not be called today is an implementation
>>>> bug/leak surfacing the API.
>>>>
>>>> So I see 2 options:
>>>>
>>>> 1. make it mandatory and get rid of the caching - which shouldnt help
>>>> much in current state in terms of perf
>>>> 2. keep teardown a final release object (which is not that useful cause
>>>> of the end of the sentence) and add a clean cache lifecycle management
>>>>
>>>> tempted to say 1 is saner short terms, in particular cause beam is 2.x
>>>> and users already use it this way.
>>>>
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> |  Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>
>>>> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>
>>>>> So the concern is that @TearDown might not be called?
>>>>>
>>>>> Let's understand the reason for @TearDown. The runner is free to cache
>>>>> the DoFn object across many invocations, and indeed in streaming this is
>>>>> often a critical optimization. However if the runner does decide to 
>>>>> destroy
>>>>> the DoFn object (e.g. because it's being evicted from cache), often users
>>>>> need a callback to tear down associated resources (file handles, RPC
>>>>> connections, etc.).
>>>>>
>>>>> Now @TearDown isn't guaranteed to be called for a simple reason: the
>>>>> runner might never tear down the DoFn object! The runner might well decide
>>>>> to cache the object forever, in which case there is never .a time to call
>>>>> @TearDown. There is no violation of semantics here.
>>>>>
>>>>> Also, the point about not calling teardown if the JVM crashes might
>>>>> well sound implicit with no need to mention it. However empirically users
>>>>> do misunderstand even this, so it's worth mentioning.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I'm a bit concerned of this PR
>>>>>> https://github.com/apache/beam/pull/4637
>>>>>>
>>>>>> I understand the intent but I'd like to share how I see it and why it
>>>>>> is an issue for me:
>>>>>>
>>>>>> 1. you can't help if the JVM crash in any case. Tomcat had a try to
>>>>>> preallocate some memory for instance to free it in case of OOME and try 
>>>>>> to
>>>>>> recover but it never prooved useful and got dropped recently. This is a
>>>>>> good example you can't do anything if there is a cataclism and therefore
>>>>>> any framework or lib will not be blamed for it
>>>>>> 2. if you expose an API, its behavior must be well defined. In the
>>>>>> case of a portable library like Beam it is even more important otherwise 
>>>>>> it
>>>>>> leads users to not use the API or the projet :(.
>>>>>>
>>>>>>
>>>>>> These two points lead to say that if the JVM crashes it is ok to not
>>>>>> call teardown and it is even implicit in any programming environment so 
>>>>>> no
>>>>>> need to mention it. However that a runner doesn't call teardown is a bug
>>>>>> and not a feature or something intended because it can have a huge impact
>>>>>> on the user flow.
>>>>>>
>>>>>> The user workarounds are to use custom threads with timeouts to
>>>>>> execute the actions or things like that, all bad solutions to replace a
>>>>>> buggy API, if you remove the contract guarantee.
>>>>>>
>>>>>> To make it obvious: substring(from, to): will substring the current
>>>>>> string between from and to...or not. Would you use the function?
>>>>>>
>>>>>> What I ask is to add in the javadoc that the contract enforces the
>>>>>> runner to call that. Which means the core doesn't guarantee it but 
>>>>>> imposes
>>>>>> the runner to do so. This way the not portable behavior is where it 
>>>>>> belongs
>>>>>> to, in the vendor specific code. It leads to a reliable API for the end
>>>>>> user and let runners document they don't respect - yet - the API when
>>>>>> relevant.
>>>>>>
>>>>>> wdyt?
>>>>>>
>>>>>> Romain Manni-Bucau
>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>
>>>>>
>>>>>
>>>>
>

Re: @TearDown guarantees

Reply via email to