Sure, Thank you for your confirmation Luke! :) Best, Jincheng
Luke Cwik <[email protected]> 于2019年10月29日周二 上午1:20写道: > I would go with creating JIRAs and PRs directly since this doesn't seem to > be contentious since you have received feedback from a few folks and they > are all suggesting the same thing. > > On Sun, Oct 27, 2019 at 9:27 PM jincheng sun <[email protected]> > wrote: > >> Hi all, >> >> Thanks a lot for your feedback. It seems that we have reached consensus >> that both "Approach 2" and "Approach 3" are needed. "Approach 3" is a good >> supplement for "Approach 2" and I also prefer "Approach 2" and "Approach 3" >> for now. >> >> Do we need to vote on this discussion or I can create JIRAs and submit >> the PRs directly? >> >> Best, >> Jincheng >> >> Luke Cwik <[email protected]> 于2019年10月26日周六 上午4:01写道: >> >>> Approach 3 is about caching the bundle descriptor forever but tearing >>> down a "live" instance of the DoFns at some SDK chosen arbitrary point in >>> time. This way if a future ProcessBundleRequest comes in, a new "live" >>> instance can be constructed. >>> Approach 2 is still needed so that when the workers are being >>> shutdown all the "live" instances are torn down. >>> >>> On Fri, Oct 25, 2019 at 12:56 PM Robert Burke <[email protected]> >>> wrote: >>> >>>> Approach 2 isn't incompatible with approach 3. 3 simple sets down >>>> convention/configuration for the conditions when the SDK will do this after >>>> process bundle has completed. >>>> >>>> >>>> >>>> On Fri, Oct 25, 2019, 12:34 PM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >>>>> I think we'll still need approach (2) for when the pipeline finishes >>>>> and a runner is tearing down workers. >>>>> >>>>> On Fri, Oct 25, 2019 at 10:36 AM Maximilian Michels <[email protected]> >>>>> wrote: >>>>> > >>>>> > Hi Jincheng, >>>>> > >>>>> > Thanks for bringing this up and capturing the ideas in the doc. >>>>> > >>>>> > Intuitively, I would have also considered adding a new Proto message >>>>> for >>>>> > the teardown, but I think the idea to trigger this logic when the SDK >>>>> > Harness evicts process bundle descriptors is more elegant. >>>>> > >>>>> > Thanks, >>>>> > Max >>>>> > >>>>> > On 25.10.19 17:23, Luke Cwik wrote: >>>>> > > I like approach 3 since it doesn't add additional complexity to >>>>> the API >>>>> > > and individual SDKs can choose to implement any clean-up strategy >>>>> they >>>>> > > want or none at all which is the simplest. >>>>> > > >>>>> > > On Thu, Oct 24, 2019 at 8:46 PM jincheng sun < >>>>> [email protected] >>>>> > > <mailto:[email protected]>> wrote: >>>>> > > >>>>> > > Hi, >>>>> > > >>>>> > > Thanks for your comments in doc, I have add Approach 3 which >>>>> you >>>>> > > mentioned! @Luke >>>>> > > >>>>> > > For now, we should do a decision for Approach 3 and Approach 1. >>>>> > > Detail can be found in doc [1] >>>>> > > >>>>> > > Welcome anyone's feedback :) >>>>> > > >>>>> > > Regards, >>>>> > > Jincheng >>>>> > > >>>>> > > [1] >>>>> > > >>>>> https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing >>>>> > > >>>>> > > jincheng sun <[email protected] >>>>> > > <mailto:[email protected]>> 于2019年10月25日周五 上午10:40写道: >>>>> > > >>>>> > > Hi, >>>>> > > >>>>> > > Functionally capable of `abort`, but it will be called at >>>>> the >>>>> > > end of operator. So, I prefer `dispose` semantics. i.e., >>>>> all >>>>> > > normal logic has been executed. >>>>> > > >>>>> > > Best, >>>>> > > Jincheng >>>>> > > >>>>> > > Harsh Vardhan <[email protected] <mailto: >>>>> [email protected]>> >>>>> > > 于2019年10月23日周三 上午12:14写道: >>>>> > > >>>>> > > Would approach 1 be akin to abort semantics? >>>>> > > >>>>> > > On Mon, Oct 21, 2019 at 8:01 PM jincheng sun >>>>> > > <[email protected] <mailto: >>>>> [email protected]>> >>>>> > > wrote: >>>>> > > >>>>> > > Hi Luke, >>>>> > > >>>>> > > Thanks a lot for your reply. Since it allows to >>>>> share >>>>> > > one SDK harness between multiple executable >>>>> stages, the >>>>> > > control service termination may occur much later >>>>> than >>>>> > > the completion of an executable stage. This is the >>>>> main >>>>> > > reason I prefer runners to control the teardown of >>>>> DoFns. >>>>> > > >>>>> > > Regarding to "SDK harnesses can terminate >>>>> instances any >>>>> > > time they want and start new instances anytime as >>>>> > > well.", personally I think it's not conflict with >>>>> the >>>>> > > proposed Approach 1 as the SDK harness could >>>>> decide what >>>>> > > to do when receiving the teardown request. It >>>>> could do >>>>> > > nothing if the DoFns has already been teared down >>>>> and >>>>> > > could also tear down the DoFns if needed. >>>>> > > >>>>> > > What do you think? >>>>> > > >>>>> > > Best, >>>>> > > Jincheng >>>>> > > >>>>> > > Luke Cwik <[email protected] <mailto: >>>>> [email protected]>> >>>>> > > 于2019年10月22日周二 上午2:05写道: >>>>> > > >>>>> > > Approach 2 is currently the suggested >>>>> approach[1] >>>>> > > for DoFn's to shutdown. >>>>> > > Note that SDK harnesses can terminate >>>>> instances any >>>>> > > time they want and start new instances anytime >>>>> as well. >>>>> > > >>>>> > > Why do you want to expose this logic so that >>>>> Runners >>>>> > > could control it? >>>>> > > >>>>> > > 1: >>>>> > > >>>>> https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit# >>>>> > > >>>>> > > On Mon, Oct 21, 2019 at 4:27 AM jincheng sun >>>>> > > <[email protected] >>>>> > > <mailto:[email protected]>> wrote: >>>>> > > >>>>> > > Hi, >>>>> > > I found that in `SdkHarness` do not stop >>>>> the >>>>> > > `SdkWorker` when finish. We should add the >>>>> > > logic for stop the `SdkWorker` in >>>>> `SdkHarness`. >>>>> > > More detail can be found [1]. >>>>> > > >>>>> > > There are two approaches to solve this >>>>> issue: >>>>> > > >>>>> > > Approach 1: We can add a Fn API for >>>>> teardown >>>>> > > purpose and the runner will teardown a >>>>> specific >>>>> > > bundle descriptor via this teardown Fn API >>>>> > > during disposing. >>>>> > > Approach 2: The control service termination >>>>> > > could be seen as a signal and once SDK >>>>> harness >>>>> > > receives this signal, the teardown of the >>>>> bundle >>>>> > > descriptor will be performed. >>>>> > > >>>>> > > More detail can be found in [2]. >>>>> > > >>>>> > > As the Approach 2, SDK harness could be >>>>> shared >>>>> > > between multiple executable stages. The >>>>> control >>>>> > > service termination only occurs when all >>>>> the >>>>> > > executable stages sharing the same SDK >>>>> harness >>>>> > > finished. This means that the teardown of >>>>> DoFns >>>>> > > may not be executed immediately after an >>>>> > > executable stage is finished. >>>>> > > >>>>> > > So, I prefer Approach 1. Welcome any >>>>> feedback :) >>>>> > > >>>>> > > Best, >>>>> > > Jincheng >>>>> > > >>>>> > > [1] >>>>> > > >>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py >>>>> > > [2] >>>>> > > >>>>> https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing >>>>> > > >>>>> > > -- >>>>> > > >>>>> > > Got feedback? go/harsh-feedback >>>>> <https://goto.google.com/harsh-feedback> >>>>> > > <https://goto.google.com/harsh-feedback> >>>>> > > >>>>> >>>>
