On Sun, Feb 18, 2018, 2:06 AM Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> > > Le 18 févr. 2018 00:23, "Kenneth Knowles" <k...@google.com> a écrit : > > On Sat, Feb 17, 2018 at 3:09 PM, Romain Manni-Bucau <rmannibu...@gmail.com > > wrote: >> >> If you give an example of a high-level need (e.g. "I'm trying to write an >> IO for system $x and it requires the following initialization and the >> following cleanup logic and the following processing in between") I'll be >> better able to help you. >> >> >> Take a simple example of a transform requiring a connection. Using >> bundles is a perf killer since size is not controlled. Using teardown >> doesnt allow you to release the connection since it is a best effort thing. >> Not releasing the connection makes you pay a lot - aws ;) - or prevents you >> to launch other processings - concurrent limit. >> > > For this example @Teardown is an exact fit. If things die so badly that > @Teardown is not called then nothing else can be called to close the > connection either. What AWS service are you thinking of that stays open for > a long time when everything at the other end has died? > > > You assume connections are kind of stateless but some (proprietary) > protocols requires some closing exchanges which are not only "im leaving". > > For aws i was thinking about starting some services - machines - on the > fly in a pipeline startup and closing them at the end. If teardown is not > called you leak machines and money. You can say it can be done another > way...as the full pipeline ;). > > I dont want to be picky but if beam cant handle its components lifecycle > it can be used at scale for generic pipelines and if bound to some > particular IO. > > What does prevent to enforce teardown - ignoring the interstellar crash > case which cant be handled by any human system? Nothing technically. Why do > you push to not handle it? Is it due to some legacy code on dataflow or > something else? > Teardown *is* already documented and implemented this way (best-effort). So I'm not sure what kind of change you're asking for. > Also what does it mean for the users? Direct runner does it so if a user > udes the RI in test, he will get a different behavior in prod? Also dont > forget the user doesnt know what the IOs he composes use so this is so > impacting for the whole product than he must be handled IMHO. > > I understand the portability culture is new in big data world but it is > not a reason to ignore what people did for years and do it wrong before > doing right ;). > > My proposal is to list what can prevent to guarantee - in the normal IT > conditions - the execution of teardown. Then we see if we can handle it and > only if there is a technical reason we cant we make it > experimental/unsupported in the api. I know spark and flink can, any > unknown blocker for other runners? > > Technical note: even a kill should go through java shutdown hooks > otherwise your environment (beam enclosing software) is fully unhandled and > your overall system is uncontrolled. Only case where it is not true is when > the software is always owned by a vendor and never installed on customer > environment. In this case it belongd to the vendor to handle beam API and > not to beam to adjust its API for a vendor - otherwise all unsupported > features by one runner should be made optional right? > > All state is not about network, even in distributed systems so this is key > to have an explicit and defined lifecycle. > > > Kenn > > >