+1 On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <[email protected]> wrote:
> If usable by itself without google karma (can you use a worker without > dataflow itself?) it sounds awesome otherwise it sounds weird IMHO. > > Le jeu. 13 sept. 2018 21:36, Kai Jiang <[email protected]> a écrit : > >> +1 (non googler) >> >> big help for transparency and for future runners. >> >> Best, >> Kai >> >> On Thu, Sep 13, 2018, 11:45 Xinyu Liu <[email protected]> wrote: >> >>> Big +1 (non-googler). >>> >>> From Samza Runner's perspective, we are very happy to see dataflow >>> worker code so we can learn and compete :). >>> >>> Thanks, >>> Xinyu >>> >>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <[email protected]> >>> wrote: >>> >>>> +1 (non-googler) >>>> >>>> This is a great 👍 move >>>> >>>> Sent from my iPhone >>>> >>>> On Sep 13, 2018, at 2:25 PM, Tim Robertson <[email protected]> >>>> wrote: >>>> >>>> +1 (non googler) >>>> It sounds pragmatic, helps with transparency should issues arise and >>>> enables more people to fix. >>>> >>>> >>>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin <[email protected]> >>>> wrote: >>>> >>>>> From my perspective as a (non-Google) community member, huge +1. >>>>> >>>>> I don't see anything bad for the community about open sourcing more of >>>>> the probably-most-used runner. While the DirectRunner is probably still >>>>> the >>>>> most referential implementation of Beam, can't hurt to see more working >>>>> code. Other runners or runner implementors can refer to this code if they >>>>> want, and ignore it if they don't. >>>>> >>>>> In terms of having more code and tests to support, well, that's par >>>>> for the course. Will this change make the things that need to be done to >>>>> support them more obvious? (E.g., "this PR is blocked because someone at >>>>> Google on Dataflow team has to fix something" vs "this PR is blocked >>>>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can >>>>> see the code can fix it"). The latter seems like a clear win for the >>>>> community. >>>>> >>>>> (As long as the code donation is handled properly, but that's >>>>> completely orthogonal and I have no reason to think it wouldn't be.) >>>>> >>>>> Thanks, >>>>> Dan >>>>> >>>>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik <[email protected]> wrote: >>>>> >>>>>> Yes, I'm specifically asking the community for opinions as to whether >>>>>> it should be accepted or not. >>>>>> >>>>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> This is terrific! >>>>>>> >>>>>>> Is thread asking for opinions from the community about if it should >>>>>>> be accepted? Assuming Google side decision is made to contribute, big +1 >>>>>>> from me to include it next to other runners. >>>>>>> >>>>>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> At Google we have been importing the Apache Beam code base and >>>>>>>> integrating it with the Google portion of the codebase that supports >>>>>>>> the >>>>>>>> Dataflow worker. This process is painful as we regularly are making >>>>>>>> breaking API changes to support libraries related to running portable >>>>>>>> pipelines (and sometimes in other places as well). This has made it >>>>>>>> sometimes difficult for PR changes to make changes without either >>>>>>>> breaking >>>>>>>> something for Google or waiting for a Googler to make the change >>>>>>>> internally >>>>>>>> (e.g. dependency updates). >>>>>>>> >>>>>>>> This code is very similar to the other integrations that exist for >>>>>>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that >>>>>>>> sits >>>>>>>> on top of an execution engine. There is no super secret awesome stuff >>>>>>>> as >>>>>>>> this code was already publicly visible in the past when it was part of >>>>>>>> the >>>>>>>> Google Cloud Dataflow github repo[1]. >>>>>>>> >>>>>>>> Process wise the code will need to get approval from Google to be >>>>>>>> donated and for it to go through the code donation process but before >>>>>>>> we >>>>>>>> attempt to do that, I was wondering whether the community would object >>>>>>>> to >>>>>>>> adding this code to the master branch? >>>>>>>> >>>>>>>> The up side is that people can make breaking changes and fix it for >>>>>>>> all runners. It will also help Googlers contribute more to the >>>>>>>> portability >>>>>>>> story as it will remove the burden of doing the code import (wasted >>>>>>>> time) >>>>>>>> and it will allow people to develop in master (can have the whole >>>>>>>> project >>>>>>>> loaded in a single IDE). >>>>>>>> >>>>>>>> The downsides are that this will represent more code and unit tests >>>>>>>> to support. >>>>>>>> >>>>>>>> 1: >>>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker >>>>>>>> >>>>>>>
