Documentation on portability is still a bit sparse although there are many design documents: https://beam.apache.org/contribute/design-documents/#portability

The structure of portable Runners is not fundamentally different, but some of the operations are deferred to the SDK which runs code for all supported languages. The Runner needs to provide an integration with it.

Eventually, the old Runners will become obsolete though that won't happen very soon. Performance should be slightly better on the old Runners.

I think writing an old-style Runner now will give you enough experience to port it to the new language-portable style later on.

Cheers,
Max

On 20.03.19 14:52, Can Gencer wrote:
I had a look at "GreedyPipelineFuser" and indeed this was what exactly I was talking about.

Is https://beam.apache.org/roadmap/portability/ still the best information about the portable runners or is there a more in-depth guide available anywhere?

On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <c...@hazelcast.com <mailto:c...@hazelcast.com>> wrote:

    Hi Max,

    Thanks. When you mean "old-style runner"  is this meant that this
    style of runners will be obsolete and only the portable one will be
    supported? The documentation for portable runners wasn't quite
    complete and the barrier to entry for writing an old style runner
    seemed easier for us and the old style runner should have better
    performance?

    On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <m...@apache.org
    <mailto:m...@apache.org>> wrote:

        Hi Can,

        Thanks for the update. Interesting question. Flink has an
        optimization
        built in called chaining which works together nicely with Beam.
        Essentially, operators which share the same partitioning get
        executed
        one after another inside a master operator. This saves resources.

        Interestingly, Beam's Fuser for portable Runners does something
        similar.
        AFAIK there is no built-in solution for the old-style Runners. I
        think
        it would be possible to build something like this on top of the
        existing
        translation.

        Cheers,
        Max

        On 20.03.19 13:07, Can Gencer wrote:
         > Hi again,
         >
         > We've made some progress on the runner since writing this
        more than a
         > month ago, the repo is available here publicly:
         > https://github.com/hazelcast/hazelcast-jet-beam-runner
         >
         > Still very much a work in progress though. One of the issues
        I wanted to
         > raise is that currently we're translating each PTransform to
        a Jet
         > Vertex (could be consider analogous to a Flink operator or a
        vertex in
         > Tez). This is sub-optimal, since Beam creates lots of
        transforms for
         > computations that could be performed inside the same Vertex,
        such as
         > subsequent mapping transforms and many others. Ideally you
        only need
         > distinct vertices where the data is re-partitioned and/or
        shuffled. I'm
         > curious if Beam offers some way of translating the PTransform
        graph to a
         > more minimal set of transforms, i.e. some kind of planner or
        would this
         > have to be custom code? We've done a similar integration with
        Cascading
         > in the past and it offered a planner which given a set of
        rules would
         > partition the Cascading DAG into a minimal set of vertices
        for the same
         > DAG. Curious if Beam has any similar functionality?
         >
         >
         >
         > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles
        <k...@apache.org <mailto:k...@apache.org>
         > <mailto:k...@apache.org <mailto:k...@apache.org>>> wrote:
         >
         >     Elaborating on what Robert alluded to: when I wrote that
        runner
         >     author guide, portability was in its infancy. Now Beam
        Python can be
         >     run on Flink. So that guide is primarily focused on the
        "deserialize
         >     a Java DoFn and call its methods" approach. A decent
        amount of it is
         >     still really important to know, but is now the
        responsibility of the
         >     "SDK harness", aka language-specific coprocessor. For
        Python & Go &
         >     <insert new SDK language here> you really want to use the
         >     portability protos and the portable Flink runner is the
        best model.
         >
         >     Kenn
         >
         >
         >     On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw
        <rober...@google.com <mailto:rober...@google.com>
         >     <mailto:rober...@google.com
        <mailto:rober...@google.com>>> wrote:
         >
         >         On Fri, Feb 15, 2019 at 7:36 AM Can Gencer
        <c...@hazelcast.com <mailto:c...@hazelcast.com>
         >         <mailto:c...@hazelcast.com
        <mailto:c...@hazelcast.com>>> wrote:
         >          >
         >          > We at Hazelcast are looking into writing a Beam
        runner for
         >         Hazelcast Jet
        (https://github.com/hazelcast/hazelcast-jet). I
         >         wanted to introduce myself as we'll likely have
        questions as we
         >         start development.
         >
         >         Welcome!
         >
         >         Hazelcast looks interesting, a Beam runner for it
        would be very
         >         cool.
         >
         >          > Some of the things I'm wondering about currently:
         >          >
         >          > * Currently there seems to be a guide available at
         > https://beam.apache.org/contribute/runner-guide/ , is this up to
         >         date? Is there anything in specific to be aware of
        when starting
         >         with a new runner that's not covered here?
         >
         >         That looks like a pretty good starting point. At a
        quick glance, I
         >         don't see anything that looks out of date. Another
        resource that
         >         might
         >         be helpful is a talk from last year on writing an SDK
        (but as it
         >         mostly covers the runner-sdk interaction, it's also
        quite useful for
         >         understanding the runner side:
         >
        
https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
         >         And please feel free to ask any questions on this
        list as well; we'd
         >         be happy to help.
         >
         >          > * Should we be targeting the latest master which is at
         >         2.12-SNAPSHOT or a stable version?
         >
         >         I would target the latest master.
         >
         >          > * After a runner is developed, how is the maintenance
         >         typically handled, as the runners seems to be part of
        Beam codebase?
         >
         >         Either is possible. Several runner adapters are part
        of the Beam
         >         codebase, but for example the IMB Streams Beam runner
        is not. There
         >         are certainly pros and cons (certainly early on when
        the APIs
         >         themselves were under heavy development it was easier
        to keep things
         >         in sync in the same codebase, but things have mostly
        stabilized
         >         now).
         >         A runner only becomes part of the Beam codebase if
        there are members
         >         of the community committed to maintaining it (which
        could include
         >         you). Both approaches are fine.
         >
         >         - Robert
         >

Reply via email to