- Flink JobService: in review <https://github.com/apache/beam/pull/5262>
That's TODO (above PR was merged, but it doesn't contain the Flink job service). Discussion about it is here: https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?ts=5afa1238 Thanks, Thomas On Fri, May 18, 2018 at 7:01 AM, Thomas Weise <t...@apache.org> wrote: > Most of it should probably go to https://beam.apache.org/con > tribute/portability/ > > Also for reference, here is the prototype doc: https://s.apache.org/beam- > portability-team-doc > > Thomas > > On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles <k...@google.com> wrote: > >> This is awesome. Would you be up for adding a brief description at >> https://beam.apache.org/contribute/#works-in-progress and maybe a >> pointer to a gdoc with something like the contents of this email? (my >> reasoning is (a) keep the contribution guide concise but (b) all this >> detail is helpful yet (c) the detail may be ever-changing so making a >> separate web page is not the best format) >> >> Kenn >> >> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Hi all, >>> >>> A little over a month ago, a large group of Beam community members has >>> been working a prototype of a portable Flink runner - that is, a runner >>> that can execute Beam pipelines on Flink via the Portability API >>> <https://s.apache.org/beam-runner-api>. The prototype was developed in >>> a separate branch >>> <https://github.com/bsidhom/beam/tree/hacking-job-server> and was >>> successfully demonstrated at Flink Forward, where it ran Python and Go >>> pipelines in a limited setting. >>> >>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben >>> Sidhom and myself) have been working on productionizing the prototype to >>> address its limitations and do things "the right way", preparing to reuse >>> this work for developing other portable runners (e.g. Spark). This involves >>> a surprising amount of work, since many important design and implementation >>> concerns could be ignored for the purposes of a prototype. I wanted to give >>> an update on where we stand now. >>> >>> Our immediate milestone in sight is *Run Java and Python batch >>> WordCount examples against a distributed remote Flink cluster*. That >>> involves a few moving parts, roughly in order of appearance: >>> >>> *Job submission:* >>> - The SDK is configured to use a "portable runner", whose responsibility >>> is to run the pipeline against a given JobService endpoint. >>> - The portable runner converts the pipeline to a portable Pipeline proto >>> - The runner finds out which artifacts it needs to stage, and staging >>> them against an ArtifactStagingService >>> - A Flink-specific JobService receives the Pipeline proto, performs some >>> optimizations (e.g. fusion) and translates it to Flink datasets and >>> functions >>> >>> *Job execution:* >>> - A Flink function executes a fused chain of Beam transforms (an >>> "executable stage") by converting the input and the stage to bundles and >>> executing them against an SDK harness >>> - The function starts the proper SDK harness, auxiliary services (e.g. >>> artifact retrieval, side input handling) and wires them together >>> - The function feeds the data to the harness and receives data back. >>> >>> *And here is our status of implementation for these parts:* basically, >>> almost everything is either done or in review. >>> >>> *Job submission:* >>> - General-purpose portable runner in the Python SDK: done >>> <https://github.com/apache/beam/pull/5301>; Java SDK: also done >>> <https://github.com/apache/beam/pull/5150> >>> - Artifact staging from the Python SDK: in review (PR >>> <https://github.com/apache/beam/pull/5273>, PR >>> <https://github.com/apache/beam/pull/5251>); in java, it's done also >>> - Flink JobService: in review <https://github.com/apache/beam/pull/5262> >>> - Translation from a Pipeline proto to Flink datasets and functions: >>> done <https://github.com/apache/beam/pull/5226> >>> - ArtifactStagingService implementation that stages artifacts to a >>> location on a distributed filesystem: in development (design is clear) >>> >>> *Job execution:* >>> - Flink function for executing via an SDK harness: done >>> <https://github.com/apache/beam/pull/5285> >>> - APIs for managing lifecycle of an SDK harness: done >>> <https://github.com/apache/beam/pull/5152> >>> - Specific implementation of those APIs using Docker: part done >>> <https://github.com/apache/beam/pull/5189>, part in review >>> <https://github.com/apache/beam/pull/5392> >>> - ArtifactRetrievalService that retrieves artifacts from the location >>> where ArtifactStagingService staged them: in development. >>> >>> We expect that the in-review parts will be done, and the in-development >>> parts be developed, in the next 2-3 weeks. We will, of course, update the >>> community when this important milestone is reached. >>> >>> *After that, the next milestones include:* >>> - Sett up Java, Python and Go ValidatesRunner tests to run against the >>> portable Flink runner, and get them to pass >>> - Expand Python and Go to parity in terms of such test coverage >>> - Implement the portable Spark runner, with a similar lifecycle but >>> reusing almost all of the Flink work >>> - Add support for streaming to both (which requires SDF - that work is >>> progressing in parallel and by this point should be in a suitable place) >>> >>> *For people who would like to get involved in this effort: *You can >>> already help out by improving ValidatesRunner test coverage in Python and >>> Go. Java has >300 such tests, Python has only a handful. There'll be a >>> large amount of parallelizable work once we get the VR test suites running >>> - stay tuned. SDF+Portability is also expected to produce a lot of >>> parallelizable work up for grabs within several weeks. >>> >>> Thanks! >>> >> >