Re: @TearDown guarantees

Thomas Groh Fri, 16 Feb 2018 10:08:23 -0800

On perf: Deserialization of an arbitrary object is expensive. This cost is
amortized over all of the elements that the object processes, but for a
runner with small bundles, that cost never gets meaningfully amortized -
deserializing a DoFn instance of unknown complexity to process one element
means that we're multiplying our decoding costs by potentially multiple
times. Reusing user Fns permits us to amortize across worker lifetimes,
which is many-times beneficial.


On resilience: You should distinguish "reliably" with "always". Users can
depend on "always" for correctness, but can't depend on something done
"reliably" for that. "Reliably" can be generally depended on for
performance, which is why Teardown exists.

Runners can call the Teardown method *almost always *if a DoFn instance
will not be reused (in the absence of arbitrary failures, the runner
*MUST* call
Teardown, according to the original spec), but *SHOULD* and *MUST* are
extremely different in terms of implementation requirements. If you say
that Teardown *MUST* be called, what this means is that a runner *MUST
NOT* have
resources that fail arbitrarily, and this is not an acceptable restriction
for any existing distributed backend.

If you need something which is always called once per element at a
granularity coarser than once per element, that is exactly what
FinishBundle provides, and is why the method exists.

The original proposal for Setup/Teardown:
https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit#

On Fri, Feb 16, 2018 at 9:39 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> So do I get it right a leak of Dataflow implementation impacts the API?
> Also sounds like this perf issues is due to a blind serialization instead
> of modelizing what is serialized - nothing should be slow enough in the
> serialization at that level, do you have more details on that particular
> point?
>
It also means you accept to leak particular instances data like password
> etc (all the @AutoValue builder ones typically) since you dont call - or
> not reliably - a postexecution hook which should get solved ASAP.
>

> @Thomas: I understand your update was to align the dataflow behavior to
> the API but actually the opposite should be done, align DataFlow impl on
> the API. If we disagree tearDown is [1;1] - I'm fine with that, then
> teardown is not really usable for users and we miss a
> "the fact that we leave the runner discretion on when it can call
> teardown does not make this poorly-defined; it means that users should not
> depend on teardown being called for correct behavior, and *this has
> always been true and will continue to be true*."
> This is not really the case, you say it yourself "[...] does not make
> this poorly-defined [...] it means that users should not depend on
> teardown". This literally means @TearDown is not part of the API. Once
> again I'm fine with it but this kind of API is needed.
> "*this has always been true and will continue to be true"*
> Not really too since it was not clear before and runner dependent so users
> can depend on it.
>
> With both statements I think it should just get fixed and made reliable
> which is technically possible IMHO instead of creating a new API which
> would make teardown a cache hook which is an implementation detail which
> shouldn't surface in the API.
>
> @AfterExecution. @FinishBundle is once the bundle finishes so not a
> "finally" for the dofn regarding the execution.
>
> Side note: the success callback hook which has been discussed N times
> doesnt match the need which is really per instance (= accessible from that
> particular instance and not globally) in both success and failure cases.
>
>
> 2018-02-16 18:18 GMT+01:00 Kenneth Knowles <k...@google.com>:
>
>> Which runner's bundling are you concerned with? It sounds like the Flink
>> runner?
>>
>
> Flink, Spark, DirectRunner, DataFlow at least (others would be good but
> are out of scope)
>
>
>>
>> Kenn
>>
>>
>> On Fri, Feb 16, 2018 at 9:04 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>> 2018-02-16 17:59 GMT+01:00 Kenneth Knowles <k...@google.com>:
>>>
>>>> What I am hearing is this:
>>>>
>>>>  - @FinishBundle does what you want (a reliable "flush" call) but your
>>>> runner is not doing a good job of bundling
>>>>
>>>
>>> Nop, finishbundle is defined but not a bundle. Typically for 1 million
>>> rows I'll get 1 million calls in flink and 1 call in spark (today) so this
>>> is not a way to call a final task to release dofn internal instances or do
>>> some one time auditing.
>>>
>>>
>>>>  - @Teardown has well-defined semantics and they are not what you want
>>>>
>>>
>>> "
>>> Note that calls to the annotated method are best effort, and may not
>>> occur for arbitrary reasons"
>>>
>>> is not really "well-defined" and is also a breaking change compared to
>>> the < 2.3.x (x >= 1) .
>>>
>>>
>>>> So you are hoping for something that is called less frequently but is
>>>> still mandatory.
>>>>
>>>> Just trying to establish the basics to start over and get this on track
>>>> to solving the real problem.
>>>>
>>>
>>> Concretely I need a well defined lifecycle for any DoFn executed in beam
>>> and today there is no such a thing making it impossible to develop
>>> correctly transforms/fn on an user side.
>>>
>>>
>>>>
>>>> Kenn
>>>>
>>>>
>>>> On Fri, Feb 16, 2018 at 8:51 AM, Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> finish bundle is well defined and must be called, right, not at the
>>>>> end so you still miss teardown as a user. Bundles are defined by the 
>>>>> runner
>>>>> and you can have 100000 bundles per batch (even more for a stream ;)) so
>>>>> you dont want to release your resources or handle you execution auditing 
>>>>> in
>>>>> it, you want it at the end so in tear down.
>>>>>
>>>>> So yes we must have teardown reliable somehow.
>>>>>
>>>>>
>>>>> Romain Manni-Bucau
>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>
>>>>> 2018-02-16 17:43 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>>
>>>>>> +1 I think @FinishBundle is the right thing to look at here.
>>>>>>
>>>>>> On Fri, Feb 16, 2018, 8:41 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Romain
>>>>>>>
>>>>>>> Is it not @FinishBundle your solution ?
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>> Le 16 févr. 2018, à 17:06, Romain Manni-Bucau <rmannibu...@gmail.com>
>>>>>>> a écrit:
>>>>>>>>
>>>>>>>> I see Reuven, so it is actually a broken contract for end users
>>>>>>>> more than a bug. Concretely a user must have a way to execute code 
>>>>>>>> once the
>>>>>>>> teardown is no more used and a teardown is populated by the user in the
>>>>>>>> context of an execution.
>>>>>>>> It means that if the environment wants to pool (cache) the
>>>>>>>> instances it must provide a postBorrowFromCache and preReturnToCache 
>>>>>>>> to let
>>>>>>>> the user handle that - we'll get back to EJB and passivation ;).
>>>>>>>>
>>>>>>>> Personally I think it is fine to cache the instances for the
>>>>>>>> duration of an execution but not accross execution. Concretely if you 
>>>>>>>> check
>>>>>>>> out the API it should just not be possible for a runner since the 
>>>>>>>> lifecycle
>>>>>>>> is not covered and the fact teardown can not be called today is an
>>>>>>>> implementation bug/leak surfacing the API.
>>>>>>>>
>>>>>>>> So I see 2 options:
>>>>>>>>
>>>>>>>> 1. make it mandatory and get rid of the caching - which shouldnt
>>>>>>>> help much in current state in terms of perf
>>>>>>>> 2. keep teardown a final release object (which is not that useful
>>>>>>>> cause of the end of the sentence) and add a clean cache lifecycle
>>>>>>>> management
>>>>>>>>
>>>>>>>> tempted to say 1 is saner short terms, in particular cause beam is
>>>>>>>> 2.x and users already use it this way.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Romain Manni-Bucau
>>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>>>
>>>>>>>> 2018-02-16 16:53 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>>>>>
>>>>>>>>> So the concern is that @TearDown might not be called?
>>>>>>>>>
>>>>>>>>> Let's understand the reason for @TearDown. The runner is free to
>>>>>>>>> cache the DoFn object across many invocations, and indeed in 
>>>>>>>>> streaming this
>>>>>>>>> is often a critical optimization. However if the runner does decide to
>>>>>>>>> destroy the DoFn object (e.g. because it's being evicted from cache), 
>>>>>>>>> often
>>>>>>>>> users need a callback to tear down associated resources (file 
>>>>>>>>> handles, RPC
>>>>>>>>> connections, etc.).
>>>>>>>>>
>>>>>>>>> Now @TearDown isn't guaranteed to be called for a simple reason:
>>>>>>>>> the runner might never tear down the DoFn object! The runner might 
>>>>>>>>> well
>>>>>>>>> decide to cache the object forever, in which case there is never .a 
>>>>>>>>> time to
>>>>>>>>> call @TearDown. There is no violation of semantics here.
>>>>>>>>>
>>>>>>>>> Also, the point about not calling teardown if the JVM crashes
>>>>>>>>> might well sound implicit with no need to mention it. However 
>>>>>>>>> empirically
>>>>>>>>> users do misunderstand even this, so it's worth mentioning.
>>>>>>>>>
>>>>>>>>> Reuven
>>>>>>>>>
>>>>>>>>> On Fri, Feb 16, 2018 at 2:11 AM, Romain Manni-Bucau <
>>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> I'm a bit concerned of this PR
>>>>>>>>>> https://github.com/apache/beam/pull/4637
>>>>>>>>>>
>>>>>>>>>> I understand the intent but I'd like to share how I see it and
>>>>>>>>>> why it is an issue for me:
>>>>>>>>>>
>>>>>>>>>> 1. you can't help if the JVM crash in any case. Tomcat had a try
>>>>>>>>>> to preallocate some memory for instance to free it in case of OOME 
>>>>>>>>>> and try
>>>>>>>>>> to recover but it never prooved useful and got dropped recently. 
>>>>>>>>>> This is a
>>>>>>>>>> good example you can't do anything if there is a cataclism and 
>>>>>>>>>> therefore
>>>>>>>>>> any framework or lib will not be blamed for it
>>>>>>>>>> 2. if you expose an API, its behavior must be well defined. In
>>>>>>>>>> the case of a portable library like Beam it is even more important
>>>>>>>>>> otherwise it leads users to not use the API or the projet :(.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> These two points lead to say that if the JVM crashes it is ok to
>>>>>>>>>> not call teardown and it is even implicit in any programming 
>>>>>>>>>> environment so
>>>>>>>>>> no need to mention it. However that a runner doesn't call teardown 
>>>>>>>>>> is a bug
>>>>>>>>>> and not a feature or something intended because it can have a huge 
>>>>>>>>>> impact
>>>>>>>>>> on the user flow.
>>>>>>>>>>
>>>>>>>>>> The user workarounds are to use custom threads with timeouts to
>>>>>>>>>> execute the actions or things like that, all bad solutions to 
>>>>>>>>>> replace a
>>>>>>>>>> buggy API, if you remove the contract guarantee.
>>>>>>>>>>
>>>>>>>>>> To make it obvious: substring(from, to): will substring the
>>>>>>>>>> current string between from and to...or not. Would you use the 
>>>>>>>>>> function?
>>>>>>>>>>
>>>>>>>>>> What I ask is to add in the javadoc that the contract enforces
>>>>>>>>>> the runner to call that. Which means the core doesn't guarantee it 
>>>>>>>>>> but
>>>>>>>>>> imposes the runner to do so. This way the not portable behavior is 
>>>>>>>>>> where it
>>>>>>>>>> belongs to, in the vendor specific code. It leads to a reliable API 
>>>>>>>>>> for the
>>>>>>>>>> end user and let runners document they don't respect - yet - the API 
>>>>>>>>>> when
>>>>>>>>>> relevant.
>>>>>>>>>>
>>>>>>>>>> wdyt?
>>>>>>>>>>
>>>>>>>>>> Romain Manni-Bucau
>>>>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |   Blog
>>>>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>>>> <http://rmannibucau.wordpress.com> |  Github
>>>>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: @TearDown guarantees

Reply via email to