I had a look at "GreedyPipelineFuser" and indeed this was what exactly I
was talking about.

Is https://beam.apache.org/roadmap/portability/ still the best information
about the portable runners or is there a more in-depth guide available
anywhere?

On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <c...@hazelcast.com> wrote:

> Hi Max,
>
> Thanks. When you mean "old-style runner"  is this meant that this style of
> runners will be obsolete and only the portable one will be supported? The
> documentation for portable runners wasn't quite complete and the barrier to
> entry for writing an old style runner seemed easier for us and the old
> style runner should have better performance?
>
> On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <m...@apache.org> wrote:
>
>> Hi Can,
>>
>> Thanks for the update. Interesting question. Flink has an optimization
>> built in called chaining which works together nicely with Beam.
>> Essentially, operators which share the same partitioning get executed
>> one after another inside a master operator. This saves resources.
>>
>> Interestingly, Beam's Fuser for portable Runners does something similar.
>> AFAIK there is no built-in solution for the old-style Runners. I think
>> it would be possible to build something like this on top of the existing
>> translation.
>>
>> Cheers,
>> Max
>>
>> On 20.03.19 13:07, Can Gencer wrote:
>> > Hi again,
>> >
>> > We've made some progress on the runner since writing this more than a
>> > month ago, the repo is available here publicly:
>> > https://github.com/hazelcast/hazelcast-jet-beam-runner
>> >
>> > Still very much a work in progress though. One of the issues I wanted
>> to
>> > raise is that currently we're translating each PTransform to a Jet
>> > Vertex (could be consider analogous to a Flink operator or a vertex in
>> > Tez). This is sub-optimal, since Beam creates lots of transforms for
>> > computations that could be performed inside the same Vertex, such as
>> > subsequent mapping transforms and many others. Ideally you only need
>> > distinct vertices where the data is re-partitioned and/or shuffled. I'm
>> > curious if Beam offers some way of translating the PTransform graph to
>> a
>> > more minimal set of transforms, i.e. some kind of planner or would this
>> > have to be custom code? We've done a similar integration with Cascading
>> > in the past and it offered a planner which given a set of rules would
>> > partition the Cascading DAG into a minimal set of vertices for the same
>> > DAG. Curious if Beam has any similar functionality?
>> >
>> >
>> >
>> > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles <k...@apache.org
>> > <mailto:k...@apache.org>> wrote:
>> >
>> >     Elaborating on what Robert alluded to: when I wrote that runner
>> >     author guide, portability was in its infancy. Now Beam Python can be
>> >     run on Flink. So that guide is primarily focused on the "deserialize
>> >     a Java DoFn and call its methods" approach. A decent amount of it is
>> >     still really important to know, but is now the responsibility of the
>> >     "SDK harness", aka language-specific coprocessor. For Python & Go &
>> >     <insert new SDK language here> you really want to use the
>> >     portability protos and the portable Flink runner is the best model.
>> >
>> >     Kenn
>> >
>> >
>> >     On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw <
>> rober...@google.com
>> >     <mailto:rober...@google.com>> wrote:
>> >
>> >         On Fri, Feb 15, 2019 at 7:36 AM Can Gencer <c...@hazelcast.com
>> >         <mailto:c...@hazelcast.com>> wrote:
>> >          >
>> >          > We at Hazelcast are looking into writing a Beam runner for
>> >         Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I
>> >         wanted to introduce myself as we'll likely have questions as we
>> >         start development.
>> >
>> >         Welcome!
>> >
>> >         Hazelcast looks interesting, a Beam runner for it would be very
>> >         cool.
>> >
>> >          > Some of the things I'm wondering about currently:
>> >          >
>> >          > * Currently there seems to be a guide available at
>> >         https://beam.apache.org/contribute/runner-guide/ , is this up
>> to
>> >         date? Is there anything in specific to be aware of when starting
>> >         with a new runner that's not covered here?
>> >
>> >         That looks like a pretty good starting point. At a quick
>> glance, I
>> >         don't see anything that looks out of date. Another resource that
>> >         might
>> >         be helpful is a talk from last year on writing an SDK (but as it
>> >         mostly covers the runner-sdk interaction, it's also quite
>> useful for
>> >         understanding the runner side:
>> >
>> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
>> >         And please feel free to ask any questions on this list as well;
>> we'd
>> >         be happy to help.
>> >
>> >          > * Should we be targeting the latest master which is at
>> >         2.12-SNAPSHOT or a stable version?
>> >
>> >         I would target the latest master.
>> >
>> >          > * After a runner is developed, how is the maintenance
>> >         typically handled, as the runners seems to be part of Beam
>> codebase?
>> >
>> >         Either is possible. Several runner adapters are part of the Beam
>> >         codebase, but for example the IMB Streams Beam runner is not.
>> There
>> >         are certainly pros and cons (certainly early on when the APIs
>> >         themselves were under heavy development it was easier to keep
>> things
>> >         in sync in the same codebase, but things have mostly stabilized
>> >         now).
>> >         A runner only becomes part of the Beam codebase if there are
>> members
>> >         of the community committed to maintaining it (which could
>> include
>> >         you). Both approaches are fine.
>> >
>> >         - Robert
>> >
>>
>

Reply via email to