2018-02-18 18:36 GMT+01:00 Eugene Kirpichov <kirpic...@google.com>: > "Machine state" is overly low-level because many of the possible reasons > can happen on a perfectly fine machine. > If you'd like to rephrase it to "it will be called except in various > situations where it's logically impossible or impractical to guarantee that > it's called", that's fine. Or you can list some of the examples above. >
Sounds ok to me > > The main point for the user is, you *will* see non-preventable situations > where it couldn't be called - it's not just intergalactic crashes - so if > the logic is very important (e.g. cleaning up a large amount of temporary > files, shutting down a large number of VMs you started etc), you have to > express it using one of the other methods that have stricter guarantees > (which obviously come at a cost, e.g. no pass-by-reference). > FinishBundle has the exact same guarantee sadly so not which which other method you speak about. Concretely if you make it really unreliable - this is what best effort sounds to me - then users can use it to clean anything but if you make it "can happen but it is unexpected and means something happent" then it is fine to have a manual - or auto if fancy - recovery procedure. This is where it makes all the difference and impacts the developpers, ops (all users basically). > > On Sun, Feb 18, 2018 at 9:16 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Agree Eugene except that "best effort" means that. It is also often used >> to say "at will" and this is what triggered this thread. >> >> I'm fine using "except if the machine state prevents it" but "best >> effort" is too open and can be very badly and wrongly perceived by users >> (like I did). >> >> >> Romain Manni-Bucau >> @rmannibucau <https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >> >> 2018-02-18 18:13 GMT+01:00 Eugene Kirpichov <kirpic...@google.com>: >> >>> It will not be called if it's impossible to call it: in the example >>> situation you have (intergalactic crash), and in a number of more common >>> cases: eg in case the worker container has crashed (eg user code in a >>> different thread called a C library over JNI and it segfaulted), JVM bug, >>> crash due to user code OOM, in case the worker has lost network >>> connectivity (then it may be called but it won't be able to do anything >>> useful), in case this is running on a preemptible VM and it was preempted >>> by the underlying cluster manager without notice or if the worker was too >>> busy with other stuff (eg calling other Teardown functions) until the >>> preemption timeout elapsed, in case the underlying hardware simply failed >>> (which happens quite often at scale), and in many other conditions. >>> >>> "Best effort" is the commonly used term to describe such behavior. >>> Please feel free to file bugs for cases where you observed a runner not >>> call Teardown in a situation where it was possible to call it but the >>> runner made insufficient effort. >>> >>> On Sun, Feb 18, 2018, 9:02 AM Romain Manni-Bucau <rmannibu...@gmail.com> >>> wrote: >>> >>>> 2018-02-18 18:00 GMT+01:00 Eugene Kirpichov <kirpic...@google.com>: >>>> >>>>> >>>>> >>>>> On Sun, Feb 18, 2018, 2:06 AM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> Le 18 févr. 2018 00:23, "Kenneth Knowles" <k...@google.com> a écrit : >>>>>> >>>>>> On Sat, Feb 17, 2018 at 3:09 PM, Romain Manni-Bucau < >>>>>> rmannibu...@gmail.com> wrote: >>>>>>> >>>>>>> If you give an example of a high-level need (e.g. "I'm trying to >>>>>>> write an IO for system $x and it requires the following initialization >>>>>>> and >>>>>>> the following cleanup logic and the following processing in between") >>>>>>> I'll >>>>>>> be better able to help you. >>>>>>> >>>>>>> >>>>>>> Take a simple example of a transform requiring a connection. Using >>>>>>> bundles is a perf killer since size is not controlled. Using teardown >>>>>>> doesnt allow you to release the connection since it is a best effort >>>>>>> thing. >>>>>>> Not releasing the connection makes you pay a lot - aws ;) - or prevents >>>>>>> you >>>>>>> to launch other processings - concurrent limit. >>>>>>> >>>>>> >>>>>> For this example @Teardown is an exact fit. If things die so badly >>>>>> that @Teardown is not called then nothing else can be called to close the >>>>>> connection either. What AWS service are you thinking of that stays open >>>>>> for >>>>>> a long time when everything at the other end has died? >>>>>> >>>>>> >>>>>> You assume connections are kind of stateless but some (proprietary) >>>>>> protocols requires some closing exchanges which are not only "im >>>>>> leaving". >>>>>> >>>>>> For aws i was thinking about starting some services - machines - on >>>>>> the fly in a pipeline startup and closing them at the end. If teardown is >>>>>> not called you leak machines and money. You can say it can be done >>>>>> another >>>>>> way...as the full pipeline ;). >>>>>> >>>>>> I dont want to be picky but if beam cant handle its components >>>>>> lifecycle it can be used at scale for generic pipelines and if bound to >>>>>> some particular IO. >>>>>> >>>>>> What does prevent to enforce teardown - ignoring the interstellar >>>>>> crash case which cant be handled by any human system? Nothing >>>>>> technically. >>>>>> Why do you push to not handle it? Is it due to some legacy code on >>>>>> dataflow >>>>>> or something else? >>>>>> >>>>> Teardown *is* already documented and implemented this way >>>>> (best-effort). So I'm not sure what kind of change you're asking for. >>>>> >>>> >>>> Remove "best effort" from the javadoc. If it is not call then it is a >>>> bug and we are done :). >>>> >>>> >>>>> >>>>> >>>>>> Also what does it mean for the users? Direct runner does it so if a >>>>>> user udes the RI in test, he will get a different behavior in prod? Also >>>>>> dont forget the user doesnt know what the IOs he composes use so this is >>>>>> so >>>>>> impacting for the whole product than he must be handled IMHO. >>>>>> >>>>>> I understand the portability culture is new in big data world but it >>>>>> is not a reason to ignore what people did for years and do it wrong >>>>>> before >>>>>> doing right ;). >>>>>> >>>>>> My proposal is to list what can prevent to guarantee - in the normal >>>>>> IT conditions - the execution of teardown. Then we see if we can handle >>>>>> it >>>>>> and only if there is a technical reason we cant we make it >>>>>> experimental/unsupported in the api. I know spark and flink can, any >>>>>> unknown blocker for other runners? >>>>>> >>>>>> Technical note: even a kill should go through java shutdown hooks >>>>>> otherwise your environment (beam enclosing software) is fully unhandled >>>>>> and >>>>>> your overall system is uncontrolled. Only case where it is not true is >>>>>> when >>>>>> the software is always owned by a vendor and never installed on customer >>>>>> environment. In this case it belongd to the vendor to handle beam API and >>>>>> not to beam to adjust its API for a vendor - otherwise all unsupported >>>>>> features by one runner should be made optional right? >>>>>> >>>>>> All state is not about network, even in distributed systems so this >>>>>> is key to have an explicit and defined lifecycle. >>>>>> >>>>>> >>>>>> Kenn >>>>>> >>>>>> >>>>>> >>