I like approach 3 since it doesn't add additional complexity to the API and individual SDKs can choose to implement any clean-up strategy they want or none at all which is the simplest.
On Thu, Oct 24, 2019 at 8:46 PM jincheng sun <[email protected]> wrote: > Hi, > > Thanks for your comments in doc, I have add Approach 3 which you > mentioned! @Luke > > For now, we should do a decision for Approach 3 and Approach 1. Detail can > be found in doc [1] > > Welcome anyone's feedback :) > > Regards, > Jincheng > > [1] > https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing > > jincheng sun <[email protected]> 于2019年10月25日周五 上午10:40写道: > >> Hi, >> >> Functionally capable of `abort`, but it will be called at the end of >> operator. So, I prefer `dispose` semantics. i.e., all normal logic has been >> executed. >> >> Best, >> Jincheng >> >> Harsh Vardhan <[email protected]> 于2019年10月23日周三 上午12:14写道: >> >>> Would approach 1 be akin to abort semantics? >>> >>> On Mon, Oct 21, 2019 at 8:01 PM jincheng sun <[email protected]> >>> wrote: >>> >>>> Hi Luke, >>>> >>>> Thanks a lot for your reply. Since it allows to share one SDK harness >>>> between multiple executable stages, the control service termination may >>>> occur much later than the completion of an executable stage. This is the >>>> main reason I prefer runners to control the teardown of DoFns. >>>> >>>> Regarding to "SDK harnesses can terminate instances any time they want >>>> and start new instances anytime as well.", personally I think it's not >>>> conflict with the proposed Approach 1 as the SDK harness could decide what >>>> to do when receiving the teardown request. It could do nothing if the DoFns >>>> has already been teared down and could also tear down the DoFns if needed. >>>> >>>> What do you think? >>>> >>>> Best, >>>> Jincheng >>>> >>>> Luke Cwik <[email protected]> 于2019年10月22日周二 上午2:05写道: >>>> >>>>> Approach 2 is currently the suggested approach[1] for DoFn's to >>>>> shutdown. >>>>> Note that SDK harnesses can terminate instances any time they want and >>>>> start new instances anytime as well. >>>>> >>>>> Why do you want to expose this logic so that Runners could control it? >>>>> >>>>> 1: >>>>> https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit# >>>>> >>>>> On Mon, Oct 21, 2019 at 4:27 AM jincheng sun <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> I found that in `SdkHarness` do not stop the `SdkWorker` when >>>>>> finish. We should add the logic for stop the `SdkWorker` in >>>>>> `SdkHarness`. >>>>>> More detail can be found [1]. >>>>>> >>>>>> There are two approaches to solve this issue: >>>>>> >>>>>> Approach 1: We can add a Fn API for teardown purpose and the runner >>>>>> will teardown a specific bundle descriptor via this teardown Fn API >>>>>> during >>>>>> disposing. >>>>>> Approach 2: The control service termination could be seen as a signal >>>>>> and once SDK harness receives this signal, the teardown of the bundle >>>>>> descriptor will be performed. >>>>>> >>>>>> More detail can be found in [2]. >>>>>> >>>>>> As the Approach 2, SDK harness could be shared between multiple >>>>>> executable stages. The control service termination only occurs when all >>>>>> the >>>>>> executable stages sharing the same SDK harness finished. This means that >>>>>> the teardown of DoFns may not be executed immediately after an executable >>>>>> stage is finished. >>>>>> >>>>>> So, I prefer Approach 1. Welcome any feedback :) >>>>>> >>>>>> Best, >>>>>> Jincheng >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py >>>>>> [2] >>>>>> https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing >>>>>> >>>>> -- >>> >>> Got feedback? go/harsh-feedback <https://goto.google.com/harsh-feedback> >>> >>
