What I am hearing is this:

 - @FinishBundle does what you want (a reliable "flush" call) but your
runner is not doing a good job of bundling
 - @Teardown has well-defined semantics and they are not what you want

So you are hoping for something that is called less frequently but is still
mandatory.

Just trying to establish the basics to start over and get this on track to
solving the real problem.

Kenn


On Fri, Feb 16, 2018 at 8:51 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> finish bundle is well defined and must be called, right, not at the end so
> you still miss teardown as a user. Bundles are defined by the runner and
> you can have 100000 bundles per batch (even more for a stream ;)) so you
> dont want to release your resources or handle you execution auditing in it,
> you want it at the end so in tear down.
>
> So yes we must have teardown reliable somehow.
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-02-16 17:43 GMT+01:00 Reuven Lax <re...@google.com>:
>
>> +1 I think @FinishBundle is the right thing to look at here.
>>
>> On Fri, Feb 16, 2018, 8:41 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Romain
>>>
>>> Is it not @FinishBundle your solution ?
>>>
>>> Regards
>>> JB
>>> Le 16 févr. 2018, à 17:06, Romain Manni-Bucau <rmannibu...@gmail.com> a
>>> écrit:
>>>>
>>>> I see Reuven, so it is actually a broken contract for end users more
>>>> than a bug. Concretely a user must have a way to execute code once the
>>>> teardown is no more used and a teardown is populated by the user in the
>>>> context of an execution.
>>>> It means that if the environment wants to pool (cache) the instances it
>>>> must provide a postBorrowFromCache and preReturnToCache to let the user
>>>> handle that - we'll get back to EJB and passivation ;).
>>>>
>>>> Personally I think it is fine to cache the instances for the duration
>>>> of an execution but not accross execution. Concretely if you check out the
>>>> API it should just not be possible for a runner since the lifecycle is not
>>>> covered and the fact teardown can not be called today is an implementation
>>>> bug/leak surfacing the API.
>>>>
>>>> So I see 2 options:
>>>>
>>>> 1. make it mandatory and get rid of the caching - which shouldnt help
>>>> much in current state in terms of perf
>>>> 2. keep teardown a final release object (which is not that useful cause
>>>> of the end of the sentence) and add a clean cache lifecycle management
>>>>
>>>> tempted to say 1 is saner short terms, in particular cause beam is 2.x
>>>> and users already use it this way.
>>>>
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> |  Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>
>>>> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>
>>>>> So the concern is that @TearDown might not be called?
>>>>>
>>>>> Let's understand the reason for @TearDown. The runner is free to cache
>>>>> the DoFn object across many invocations, and indeed in streaming this is
>>>>> often a critical optimization. However if the runner does decide to 
>>>>> destroy
>>>>> the DoFn object (e.g. because it's being evicted from cache), often users
>>>>> need a callback to tear down associated resources (file handles, RPC
>>>>> connections, etc.).
>>>>>
>>>>> Now @TearDown isn't guaranteed to be called for a simple reason: the
>>>>> runner might never tear down the DoFn object! The runner might well decide
>>>>> to cache the object forever, in which case there is never .a time to call
>>>>> @TearDown. There is no violation of semantics here.
>>>>>
>>>>> Also, the point about not calling teardown if the JVM crashes might
>>>>> well sound implicit with no need to mention it. However empirically users
>>>>> do misunderstand even this, so it's worth mentioning.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I'm a bit concerned of this PR  https://github.com/apache/beam
>>>>>> /pull/4637
>>>>>>
>>>>>> I understand the intent but I'd like to share how I see it and why it
>>>>>> is an issue for me:
>>>>>>
>>>>>> 1. you can't help if the JVM crash in any case. Tomcat had a try to
>>>>>> preallocate some memory for instance to free it in case of OOME and try 
>>>>>> to
>>>>>> recover but it never prooved useful and got dropped recently. This is a
>>>>>> good example you can't do anything if there is a cataclism and therefore
>>>>>> any framework or lib will not be blamed for it
>>>>>> 2. if you expose an API, its behavior must be well defined. In the
>>>>>> case of a portable library like Beam it is even more important otherwise 
>>>>>> it
>>>>>> leads users to not use the API or the projet :(.
>>>>>>
>>>>>>
>>>>>> These two points lead to say that if the JVM crashes it is ok to not
>>>>>> call teardown and it is even implicit in any programming environment so 
>>>>>> no
>>>>>> need to mention it. However that a runner doesn't call teardown is a bug
>>>>>> and not a feature or something intended because it can have a huge impact
>>>>>> on the user flow.
>>>>>>
>>>>>> The user workarounds are to use custom threads with timeouts to
>>>>>> execute the actions or things like that, all bad solutions to replace a
>>>>>> buggy API, if you remove the contract guarantee.
>>>>>>
>>>>>> To make it obvious: substring(from, to): will substring the current
>>>>>> string between from and to...or not. Would you use the function?
>>>>>>
>>>>>> What I ask is to add in the javadoc that the contract enforces the
>>>>>> runner to call that. Which means the core doesn't guarantee it but 
>>>>>> imposes
>>>>>> the runner to do so. This way the not portable behavior is where it 
>>>>>> belongs
>>>>>> to, in the vendor specific code. It leads to a reliable API for the end
>>>>>> user and let runners document they don't respect - yet - the API when
>>>>>> relevant.
>>>>>>
>>>>>> wdyt?
>>>>>>
>>>>>> Romain Manni-Bucau
>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>
>>>>>
>>>>>
>>>>
>

Reply via email to