Re: Thoughts on a reference runner to invest in?

Robert Bradshaw Tue, 12 Feb 2019 01:11:49 -0800

This is certainly an interesting question, and I definitely have my
opinions, but am curious as to what others think as well.

One thing that I think wasn't as clear from the outset is distinguishing
between the development of runners/core-java and development of a Java
reference runner itself. With the work on work on moving Flink to
portability, it turned out that work on the latter was not a prerequisite
for work on the former, and runners/core-java is the artifact that other
runners want to build on. I think that it is also the case, as suggested,
that a distributed runner's use of this shared library is a better
reference point (for other distributed runners) than one using the direct
runner (e.g. there is a much more obvious delineation between the runner's
responsibility and Beam code than in the direct runner where the boundaries
between orchestration, execution, and other concerns are not as clear).

As well as serving as a reference to runner implementers, the reference
runner can also be useful for prototyping (here I think Python holds an
advantage, but we're getting into subjective areas now), documenting (or
ideally augmenting the documentation of) the spec (here I'd say a smaller
advantage to Python, but neither runner clean, straightforward, and
documented enough to serve this purpose well yet), and serving as a
lightweight universal local runner against which to develop (and, possibly
use long term in place of a direct runner) new SDKs (here you'll get a wide
variety of answers whether Python or Java is easier to take on as a
dependency for a third language, or we could just package it up in a docker
image and take docker as a dependency).

Another more pragmatic note is that one thing that helped both the Flink
and FnApiRunner forwards is that they were driven forward by actual
usecases--Lyft has actual Python (necessitating portable) pipelines they
want to run on Flink, and the FnApiRunner is the direct runner for Python.
The Java ULR (at least where it is now) sits in an awkward place where its
only role is to be a reference rather than be used, which (in a world of
limited resources) makes it harder to justify investment.

- Robert

On Tue, Feb 12, 2019 at 3:53 AM Kenneth Knowles <[email protected]> wrote:

> Interesting silence here. You've got it right that the reason we initially
> chose Java was because of the cross-runner sharing. The reference runner
> could be the first target runner for any new feature and then its work
> could be directly (or indirectly via copy/paste/modify if it works better)
> be used in other runners. Examples:
>
>  - The implementations of (pre-portability) state & timers in
> runners/core-java and prototyped in the Java DirectRunner made it a matter
> of a couple of days to implement on other runners, and they saw pretty
> quick adoption.
>  - Probably the same could be said for the first drafts of the runners,
> which re-used a bunch of runners/core-java and had each others' translation
> code as a reference.
>
> I'm interested if anyone would be willing to confirm if it is because the
> FlinkRunner has forged ahead and the Dataflow worker is open source. It
> makes sense that the code from a distributed runner is an even better
> reference point if you are building another distributed runner. From the
> look of it, the SamzaRunner had no trouble getting started on portability.
>
> Kenn
>
> On Mon, Feb 11, 2019 at 6:04 PM Daniel Oliveira <[email protected]>
> wrote:
>
>> Yeah, the FnApiRunner is what I'm leaning towards too. I wasn't sure how
>> much demand there was for an actual reference implementation in Java
>> though, so I was hoping there were runner authors that would want to chime
>> in.
>>
>> On the other hand, the Flink runner could serve as a reference
>> implementation for portable features since it's further along, so maybe
>> it's not an issue regardless.
>>
>> On Mon, Feb 11, 2019 at 1:09 PM Sam Rohde <[email protected]> wrote:
>>
>>> Thanks for starting this thread. If I had to guess, I would say there is
>>> more of a demand for Python as it's more widely used for data scientists/
>>> analytics. Being pragmatic, the FnApiRunner already has more feature work
>>> than the Java so we should go with that.
>>>
>>> -Sam
>>>
>>> On Fri, Feb 8, 2019 at 10:07 AM Daniel Oliveira <[email protected]>
>>> wrote:
>>>
>>>> Hello Beam dev community,
>>>>
>>>> For those who don't know me, I work for Google and I've been working on
>>>> the Java reference runner, which is a portable, local Java runner (it's
>>>> basically the direct runner with the portability APIs implemented). Our
>>>> goal in working on this was to have a portable runner which ran locally so
>>>> it could be used by users for testing portable pipelines, devs for testing
>>>> new features with portability, and for runner authors to provide a simple
>>>> reference implementation of a portable runner.
>>>>
>>>> Due to various circumstances though, progress on the Java reference
>>>> runner has been pretty slow, and a Python runner which does pretty much the
>>>> same things was made to aid portability development in Python (called the
>>>> FnApiRunner). This runner is currently further along in feature work than
>>>> the Java reference runner, so we've been reevaluating if we should switch
>>>> to investing in it instead.
>>>>
>>>> My question to the community is: Which runner do you think would be
>>>> more valuable to the dev community and Beam users? For those of you who are
>>>> runner authors, do you have a preference for what language you'd like to
>>>> see a reference implementation in?
>>>>
>>>> Thanks,
>>>> Daniel Oliveira
>>>>
>>>

Re: Thoughts on a reference runner to invest in?

Reply via email to