Which runner's bundling are you concerned with? It sounds like the Flink
runner?

Kenn


On Fri, Feb 16, 2018 at 9:04 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
> 2018-02-16 17:59 GMT+01:00 Kenneth Knowles <k...@google.com>:
>
>> What I am hearing is this:
>>
>>  - @FinishBundle does what you want (a reliable "flush" call) but your
>> runner is not doing a good job of bundling
>>
>
> Nop, finishbundle is defined but not a bundle. Typically for 1 million
> rows I'll get 1 million calls in flink and 1 call in spark (today) so this
> is not a way to call a final task to release dofn internal instances or do
> some one time auditing.
>
>
>>  - @Teardown has well-defined semantics and they are not what you want
>>
>
> "
> Note that calls to the annotated method are best effort, and may not occur
> for arbitrary reasons"
>
> is not really "well-defined" and is also a breaking change compared to the
> < 2.3.x (x >= 1) .
>
>
>> So you are hoping for something that is called less frequently but is
>> still mandatory.
>>
>> Just trying to establish the basics to start over and get this on track
>> to solving the real problem.
>>
>
> Concretely I need a well defined lifecycle for any DoFn executed in beam
> and today there is no such a thing making it impossible to develop
> correctly transforms/fn on an user side.
>
>
>>
>> Kenn
>>
>>
>> On Fri, Feb 16, 2018 at 8:51 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> finish bundle is well defined and must be called, right, not at the end
>>> so you still miss teardown as a user. Bundles are defined by the runner and
>>> you can have 100000 bundles per batch (even more for a stream ;)) so you
>>> dont want to release your resources or handle you execution auditing in it,
>>> you want it at the end so in tear down.
>>>
>>> So yes we must have teardown reliable somehow.
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>
>>> 2018-02-16 17:43 GMT+01:00 Reuven Lax <re...@google.com>:
>>>
>>>> +1 I think @FinishBundle is the right thing to look at here.
>>>>
>>>> On Fri, Feb 16, 2018, 8:41 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Romain
>>>>>
>>>>> Is it not @FinishBundle your solution ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>> Le 16 févr. 2018, à 17:06, Romain Manni-Bucau <rmannibu...@gmail.com>
>>>>> a écrit:
>>>>>>
>>>>>> I see Reuven, so it is actually a broken contract for end users more
>>>>>> than a bug. Concretely a user must have a way to execute code once the
>>>>>> teardown is no more used and a teardown is populated by the user in the
>>>>>> context of an execution.
>>>>>> It means that if the environment wants to pool (cache) the instances
>>>>>> it must provide a postBorrowFromCache and preReturnToCache to let the 
>>>>>> user
>>>>>> handle that - we'll get back to EJB and passivation ;).
>>>>>>
>>>>>> Personally I think it is fine to cache the instances for the duration
>>>>>> of an execution but not accross execution. Concretely if you check out 
>>>>>> the
>>>>>> API it should just not be possible for a runner since the lifecycle is 
>>>>>> not
>>>>>> covered and the fact teardown can not be called today is an 
>>>>>> implementation
>>>>>> bug/leak surfacing the API.
>>>>>>
>>>>>> So I see 2 options:
>>>>>>
>>>>>> 1. make it mandatory and get rid of the caching - which shouldnt help
>>>>>> much in current state in terms of perf
>>>>>> 2. keep teardown a final release object (which is not that useful
>>>>>> cause of the end of the sentence) and add a clean cache lifecycle
>>>>>> management
>>>>>>
>>>>>> tempted to say 1 is saner short terms, in particular cause beam is
>>>>>> 2.x and users already use it this way.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Romain Manni-Bucau
>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>
>>>>>> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>>>
>>>>>>> So the concern is that @TearDown might not be called?
>>>>>>>
>>>>>>> Let's understand the reason for @TearDown. The runner is free to
>>>>>>> cache the DoFn object across many invocations, and indeed in streaming 
>>>>>>> this
>>>>>>> is often a critical optimization. However if the runner does decide to
>>>>>>> destroy the DoFn object (e.g. because it's being evicted from cache), 
>>>>>>> often
>>>>>>> users need a callback to tear down associated resources (file handles, 
>>>>>>> RPC
>>>>>>> connections, etc.).
>>>>>>>
>>>>>>> Now @TearDown isn't guaranteed to be called for a simple reason: the
>>>>>>> runner might never tear down the DoFn object! The runner might well 
>>>>>>> decide
>>>>>>> to cache the object forever, in which case there is never .a time to 
>>>>>>> call
>>>>>>> @TearDown. There is no violation of semantics here.
>>>>>>>
>>>>>>> Also, the point about not calling teardown if the JVM crashes might
>>>>>>> well sound implicit with no need to mention it. However empirically 
>>>>>>> users
>>>>>>> do misunderstand even this, so it's worth mentioning.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau <
>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> I'm a bit concerned of this PR  https://github.com/apache/beam
>>>>>>>> /pull/4637
>>>>>>>>
>>>>>>>> I understand the intent but I'd like to share how I see it and why
>>>>>>>> it is an issue for me:
>>>>>>>>
>>>>>>>> 1. you can't help if the JVM crash in any case. Tomcat had a try to
>>>>>>>> preallocate some memory for instance to free it in case of OOME and 
>>>>>>>> try to
>>>>>>>> recover but it never prooved useful and got dropped recently. This is a
>>>>>>>> good example you can't do anything if there is a cataclism and 
>>>>>>>> therefore
>>>>>>>> any framework or lib will not be blamed for it
>>>>>>>> 2. if you expose an API, its behavior must be well defined. In the
>>>>>>>> case of a portable library like Beam it is even more important 
>>>>>>>> otherwise it
>>>>>>>> leads users to not use the API or the projet :(.
>>>>>>>>
>>>>>>>>
>>>>>>>> These two points lead to say that if the JVM crashes it is ok to
>>>>>>>> not call teardown and it is even implicit in any programming 
>>>>>>>> environment so
>>>>>>>> no need to mention it. However that a runner doesn't call teardown is 
>>>>>>>> a bug
>>>>>>>> and not a feature or something intended because it can have a huge 
>>>>>>>> impact
>>>>>>>> on the user flow.
>>>>>>>>
>>>>>>>> The user workarounds are to use custom threads with timeouts to
>>>>>>>> execute the actions or things like that, all bad solutions to replace a
>>>>>>>> buggy API, if you remove the contract guarantee.
>>>>>>>>
>>>>>>>> To make it obvious: substring(from, to): will substring the current
>>>>>>>> string between from and to...or not. Would you use the function?
>>>>>>>>
>>>>>>>> What I ask is to add in the javadoc that the contract enforces the
>>>>>>>> runner to call that. Which means the core doesn't guarantee it but 
>>>>>>>> imposes
>>>>>>>> the runner to do so. This way the not portable behavior is where it 
>>>>>>>> belongs
>>>>>>>> to, in the vendor specific code. It leads to a reliable API for the end
>>>>>>>> user and let runners document they don't respect - yet - the API when
>>>>>>>> relevant.
>>>>>>>>
>>>>>>>> wdyt?
>>>>>>>>
>>>>>>>> Romain Manni-Bucau
>>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
>

Reply via email to