Re: Current progress on Portable runners

Thomas Weise Fri, 18 May 2018 17:01:55 -0700

- Flink JobService: in review <https://github.com/apache/beam/pull/5262>


That's TODO (above PR was merged, but it doesn't contain the Flink job
service).

Discussion about it is here:
https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?ts=5afa1238

Thanks,
Thomas



On Fri, May 18, 2018 at 7:01 AM, Thomas Weise <t...@apache.org> wrote:

> Most of it should probably go to https://beam.apache.org/con
> tribute/portability/
>
> Also for reference, here is the prototype doc: https://s.apache.org/beam-
> portability-team-doc
>
> Thomas
>
> On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles <k...@google.com> wrote:
>
>> This is awesome. Would you be up for adding a brief description at
>> https://beam.apache.org/contribute/#works-in-progress and maybe a
>> pointer to a gdoc with something like the contents of this email? (my
>> reasoning is (a) keep the contribution guide concise but (b) all this
>> detail is helpful yet (c) the detail may be ever-changing so making a
>> separate web page is not the best format)
>>
>> Kenn
>>
>> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> A little over a month ago, a large group of Beam community members has
>>> been working a prototype of a portable Flink runner - that is, a runner
>>> that can execute Beam pipelines on Flink via the Portability API
>>> <https://s.apache.org/beam-runner-api>. The prototype was developed in
>>> a separate branch
>>> <https://github.com/bsidhom/beam/tree/hacking-job-server> and was
>>> successfully demonstrated at Flink Forward, where it ran Python and Go
>>> pipelines in a limited setting.
>>>
>>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
>>> Sidhom and myself) have been working on productionizing the prototype to
>>> address its limitations and do things "the right way", preparing to reuse
>>> this work for developing other portable runners (e.g. Spark). This involves
>>> a surprising amount of work, since many important design and implementation
>>> concerns could be ignored for the purposes of a prototype. I wanted to give
>>> an update on where we stand now.
>>>
>>> Our immediate milestone in sight is *Run Java and Python batch
>>> WordCount examples against a distributed remote Flink cluster*. That
>>> involves a few moving parts, roughly in order of appearance:
>>>
>>> *Job submission:*
>>> - The SDK is configured to use a "portable runner", whose responsibility
>>> is to run the pipeline against a given JobService endpoint.
>>> - The portable runner converts the pipeline to a portable Pipeline proto
>>> - The runner finds out which artifacts it needs to stage, and staging
>>> them against an ArtifactStagingService
>>> - A Flink-specific JobService receives the Pipeline proto, performs some
>>> optimizations (e.g. fusion) and translates it to Flink datasets and
>>> functions
>>>
>>> *Job execution:*
>>> - A Flink function executes a fused chain of Beam transforms (an
>>> "executable stage") by converting the input and the stage to bundles and
>>> executing them against an SDK harness
>>> - The function starts the proper SDK harness, auxiliary services (e.g.
>>> artifact retrieval, side input handling) and wires them together
>>> - The function feeds the data to the harness and receives data back.
>>>
>>> *And here is our status of implementation for these parts:* basically,
>>> almost everything is either done or in review.
>>>
>>> *Job submission:*
>>> - General-purpose portable runner in the Python SDK: done
>>> <https://github.com/apache/beam/pull/5301>; Java SDK: also done
>>> <https://github.com/apache/beam/pull/5150>
>>> - Artifact staging from the Python SDK: in review (PR
>>> <https://github.com/apache/beam/pull/5273>, PR
>>> <https://github.com/apache/beam/pull/5251>); in java, it's done also
>>> - Flink JobService: in review <https://github.com/apache/beam/pull/5262>
>>> - Translation from a Pipeline proto to Flink datasets and functions:
>>> done <https://github.com/apache/beam/pull/5226>
>>> - ArtifactStagingService implementation that stages artifacts to a
>>> location on a distributed filesystem: in development (design is clear)
>>>
>>> *Job execution:*
>>> - Flink function for executing via an SDK harness: done
>>> <https://github.com/apache/beam/pull/5285>
>>> - APIs for managing lifecycle of an SDK harness: done
>>> <https://github.com/apache/beam/pull/5152>
>>> - Specific implementation of those APIs using Docker: part done
>>> <https://github.com/apache/beam/pull/5189>, part in review
>>> <https://github.com/apache/beam/pull/5392>
>>> - ArtifactRetrievalService that retrieves artifacts from the location
>>> where ArtifactStagingService staged them: in development.
>>>
>>> We expect that the in-review parts will be done, and the in-development
>>> parts be developed, in the next 2-3 weeks. We will, of course, update the
>>> community when this important milestone is reached.
>>>
>>> *After that, the next milestones include:*
>>> - Sett up Java, Python and Go ValidatesRunner tests to run against the
>>> portable Flink runner, and get them to pass
>>> - Expand Python and Go to parity in terms of such test coverage
>>> - Implement the portable Spark runner, with a similar lifecycle but
>>> reusing almost all of the Flink work
>>> - Add support for streaming to both (which requires SDF - that work is
>>> progressing in parallel and by this point should be in a suitable place)
>>>
>>> *For people who would like to get involved in this effort: *You can
>>> already help out by improving ValidatesRunner test coverage in Python and
>>> Go. Java has >300 such tests, Python has only a handful. There'll be a
>>> large amount of parallelizable work once we get the VR test suites running
>>> - stay tuned. SDF+Portability is also expected to produce a lot of
>>> parallelizable work up for grabs within several weeks.
>>>
>>> Thanks!
>>>
>>
>

Re: Current progress on Portable runners

Reply via email to