On Fri, Feb 16, 2018 at 8:06 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> I see Reuven, so it is actually a broken contract for end users more than
> a bug. Concretely a user must have a way to execute code once the teardown
> is no more used and a teardown is populated by the user in the context of
> an execution.
>

I don't understand what contract you think is broken. @FinishBundle has a
contract - it's guaranteed to be called even in the case of JVM crashes (in
that case the records will be reprocessed). @TearDown exists for resource
cleanup, so does not have this contract.

It means that if the environment wants to pool (cache) the instances it
> must provide a postBorrowFromCache and preReturnToCache to let the user
> handle that - we'll get back to EJB and passivation ;).
>

Let's not focus on caching. The runner might immediately reuse the DoFn (if
there is more data available).


>
> Personally I think it is fine to cache the instances for the duration of
> an execution but not accross execution. Concretely if you check out the API
> it should just not be possible for a runner since the lifecycle is not
> covered and the fact teardown can not be called today is an implementation
> bug/leak surfacing the API.
>

Across execution of what? records/bundles?  Being able to reuse DoFns
across bundles is actually critical to performance of streaming, and there
is data to back that.


> So I see 2 options:
>
> 1. make it mandatory and get rid of the caching - which shouldnt help much
> in current state in terms of perf
>

Why do you think caching doesn't help? It is the number one optimization
for streaming pipelines. There are a number of extremely high-volume
streaming pipelines that would not be able to run affordably without
caching.

2. keep teardown a final release object (which is not that useful cause of
> the end of the sentence) and add a clean cache lifecycle management
>
> tempted to say 1 is saner short terms, in particular cause beam is 2.x and
> users already use it this way.
>
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>:
>
>> So the concern is that @TearDown might not be called?
>>
>> Let's understand the reason for @TearDown. The runner is free to cache
>> the DoFn object across many invocations, and indeed in streaming this is
>> often a critical optimization. However if the runner does decide to destroy
>> the DoFn object (e.g. because it's being evicted from cache), often users
>> need a callback to tear down associated resources (file handles, RPC
>> connections, etc.).
>>
>> Now @TearDown isn't guaranteed to be called for a simple reason: the
>> runner might never tear down the DoFn object! The runner might well decide
>> to cache the object forever, in which case there is never .a time to call
>> @TearDown. There is no violation of semantics here.
>>
>> Also, the point about not calling teardown if the JVM crashes might well
>> sound implicit with no need to mention it. However empirically users do
>> misunderstand even this, so it's worth mentioning.
>>
>> Reuven
>>
>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I'm a bit concerned of this PR https://github.com/apache/beam/pull/4637
>>>
>>> I understand the intent but I'd like to share how I see it and why it is
>>> an issue for me:
>>>
>>> 1. you can't help if the JVM crash in any case. Tomcat had a try to
>>> preallocate some memory for instance to free it in case of OOME and try to
>>> recover but it never prooved useful and got dropped recently. This is a
>>> good example you can't do anything if there is a cataclism and therefore
>>> any framework or lib will not be blamed for it
>>> 2. if you expose an API, its behavior must be well defined. In the case
>>> of a portable library like Beam it is even more important otherwise it
>>> leads users to not use the API or the projet :(.
>>>
>>>
>>> These two points lead to say that if the JVM crashes it is ok to not
>>> call teardown and it is even implicit in any programming environment so no
>>> need to mention it. However that a runner doesn't call teardown is a bug
>>> and not a feature or something intended because it can have a huge impact
>>> on the user flow.
>>>
>>> The user workarounds are to use custom threads with timeouts to execute
>>> the actions or things like that, all bad solutions to replace a buggy API,
>>> if you remove the contract guarantee.
>>>
>>> To make it obvious: substring(from, to): will substring the current
>>> string between from and to...or not. Would you use the function?
>>>
>>> What I ask is to add in the javadoc that the contract enforces the
>>> runner to call that. Which means the core doesn't guarantee it but imposes
>>> the runner to do so. This way the not portable behavior is where it belongs
>>> to, in the vendor specific code. It leads to a reliable API for the end
>>> user and let runners document they don't respect - yet - the API when
>>> relevant.
>>>
>>> wdyt?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>
>>
>>
>

Reply via email to