Romain, the code is very similar to the adaptation layer between the shared libraries part of Apache Beam and any other runner, for example the code within runners/spark or runners/apex or runners/flink. If someone wanted to build an emulator of the Dataflow service, they would be able to re-use them but that is as impractical as writing an emulator for Flink or Spark and plugging them in as the dependency for runners/flink and runners/spark respectively.
On Thu, Sep 13, 2018 at 2:07 PM Raghu Angadi <rang...@google.com> wrote: > On Thu, Sep 13, 2018 at 12:53 PM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> If usable by itself without google karma (can you use a worker without >> dataflow itself?) it sounds awesome otherwise it sounds weird IMHO. >> > > Can you elaborate a bit more on using worker without dataflow? I > essentially see that as o part of Dataflow runner. A runner is specific to > a platform. > > I am a Googler, but commenting as a community member. > > Raghu. > >> >> Le jeu. 13 sept. 2018 21:36, Kai Jiang <jiang...@gmail.com> a écrit : >> >>> +1 (non googler) >>> >>> big help for transparency and for future runners. >>> >>> Best, >>> Kai >>> >>> On Thu, Sep 13, 2018, 11:45 Xinyu Liu <xinyuliu...@gmail.com> wrote: >>> >>>> Big +1 (non-googler). >>>> >>>> From Samza Runner's perspective, we are very happy to see dataflow >>>> worker code so we can learn and compete :). >>>> >>>> Thanks, >>>> Xinyu >>>> >>>> On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <suneel.mar...@gmail.com> >>>> wrote: >>>> >>>>> +1 (non-googler) >>>>> >>>>> This is a great 👍 move >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Sep 13, 2018, at 2:25 PM, Tim Robertson <timrobertson...@gmail.com> >>>>> wrote: >>>>> >>>>> +1 (non googler) >>>>> It sounds pragmatic, helps with transparency should issues arise and >>>>> enables more people to fix. >>>>> >>>>> >>>>> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin <dhalp...@apache.org> >>>>> wrote: >>>>> >>>>>> From my perspective as a (non-Google) community member, huge +1. >>>>>> >>>>>> I don't see anything bad for the community about open sourcing more >>>>>> of the probably-most-used runner. While the DirectRunner is probably >>>>>> still >>>>>> the most referential implementation of Beam, can't hurt to see more >>>>>> working >>>>>> code. Other runners or runner implementors can refer to this code if they >>>>>> want, and ignore it if they don't. >>>>>> >>>>>> In terms of having more code and tests to support, well, that's par >>>>>> for the course. Will this change make the things that need to be done to >>>>>> support them more obvious? (E.g., "this PR is blocked because someone at >>>>>> Google on Dataflow team has to fix something" vs "this PR is blocked >>>>>> because the Apache Beam code in foo/bar/baz is failing, and anyone who >>>>>> can >>>>>> see the code can fix it"). The latter seems like a clear win for the >>>>>> community. >>>>>> >>>>>> (As long as the code donation is handled properly, but that's >>>>>> completely orthogonal and I have no reason to think it wouldn't be.) >>>>>> >>>>>> Thanks, >>>>>> Dan >>>>>> >>>>>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik <lc...@google.com> >>>>>> wrote: >>>>>> >>>>>>> Yes, I'm specifically asking the community for opinions as to >>>>>>> whether it should be accepted or not. >>>>>>> >>>>>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi <rang...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> This is terrific! >>>>>>>> >>>>>>>> Is thread asking for opinions from the community about if it should >>>>>>>> be accepted? Assuming Google side decision is made to contribute, big >>>>>>>> +1 >>>>>>>> from me to include it next to other runners. >>>>>>>> >>>>>>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik <lc...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> At Google we have been importing the Apache Beam code base and >>>>>>>>> integrating it with the Google portion of the codebase that supports >>>>>>>>> the >>>>>>>>> Dataflow worker. This process is painful as we regularly are making >>>>>>>>> breaking API changes to support libraries related to running portable >>>>>>>>> pipelines (and sometimes in other places as well). This has made it >>>>>>>>> sometimes difficult for PR changes to make changes without either >>>>>>>>> breaking >>>>>>>>> something for Google or waiting for a Googler to make the change >>>>>>>>> internally >>>>>>>>> (e.g. dependency updates). >>>>>>>>> >>>>>>>>> This code is very similar to the other integrations that exist for >>>>>>>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that >>>>>>>>> sits >>>>>>>>> on top of an execution engine. There is no super secret awesome stuff >>>>>>>>> as >>>>>>>>> this code was already publicly visible in the past when it was part >>>>>>>>> of the >>>>>>>>> Google Cloud Dataflow github repo[1]. >>>>>>>>> >>>>>>>>> Process wise the code will need to get approval from Google to be >>>>>>>>> donated and for it to go through the code donation process but before >>>>>>>>> we >>>>>>>>> attempt to do that, I was wondering whether the community would >>>>>>>>> object to >>>>>>>>> adding this code to the master branch? >>>>>>>>> >>>>>>>>> The up side is that people can make breaking changes and fix it >>>>>>>>> for all runners. It will also help Googlers contribute more to the >>>>>>>>> portability story as it will remove the burden of doing the code >>>>>>>>> import >>>>>>>>> (wasted time) and it will allow people to develop in master (can have >>>>>>>>> the >>>>>>>>> whole project loaded in a single IDE). >>>>>>>>> >>>>>>>>> The downsides are that this will represent more code and unit >>>>>>>>> tests to support. >>>>>>>>> >>>>>>>>> 1: >>>>>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker >>>>>>>>> >>>>>>>>