Documentation on portability is still a bit sparse although there are
many design documents:
https://beam.apache.org/contribute/design-documents/#portability
The structure of portable Runners is not fundamentally different, but
some of the operations are deferred to the SDK which runs code for all
supported languages. The Runner needs to provide an integration with it.
Eventually, the old Runners will become obsolete though that won't
happen very soon. Performance should be slightly better on the old Runners.
I think writing an old-style Runner now will give you enough experience
to port it to the new language-portable style later on.
Cheers,
Max
On 20.03.19 14:52, Can Gencer wrote:
I had a look at "GreedyPipelineFuser" and indeed this was what exactly I
was talking about.
Is https://beam.apache.org/roadmap/portability/ still the best
information about the portable runners or is there a more in-depth guide
available anywhere?
On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <c...@hazelcast.com
<mailto:c...@hazelcast.com>> wrote:
Hi Max,
Thanks. When you mean "old-style runner" is this meant that this
style of runners will be obsolete and only the portable one will be
supported? The documentation for portable runners wasn't quite
complete and the barrier to entry for writing an old style runner
seemed easier for us and the old style runner should have better
performance?
On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <m...@apache.org
<mailto:m...@apache.org>> wrote:
Hi Can,
Thanks for the update. Interesting question. Flink has an
optimization
built in called chaining which works together nicely with Beam.
Essentially, operators which share the same partitioning get
executed
one after another inside a master operator. This saves resources.
Interestingly, Beam's Fuser for portable Runners does something
similar.
AFAIK there is no built-in solution for the old-style Runners. I
think
it would be possible to build something like this on top of the
existing
translation.
Cheers,
Max
On 20.03.19 13:07, Can Gencer wrote:
> Hi again,
>
> We've made some progress on the runner since writing this
more than a
> month ago, the repo is available here publicly:
> https://github.com/hazelcast/hazelcast-jet-beam-runner
>
> Still very much a work in progress though. One of the issues
I wanted to
> raise is that currently we're translating each PTransform to
a Jet
> Vertex (could be consider analogous to a Flink operator or a
vertex in
> Tez). This is sub-optimal, since Beam creates lots of
transforms for
> computations that could be performed inside the same Vertex,
such as
> subsequent mapping transforms and many others. Ideally you
only need
> distinct vertices where the data is re-partitioned and/or
shuffled. I'm
> curious if Beam offers some way of translating the PTransform
graph to a
> more minimal set of transforms, i.e. some kind of planner or
would this
> have to be custom code? We've done a similar integration with
Cascading
> in the past and it offered a planner which given a set of
rules would
> partition the Cascading DAG into a minimal set of vertices
for the same
> DAG. Curious if Beam has any similar functionality?
>
>
>
> On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles
<k...@apache.org <mailto:k...@apache.org>
> <mailto:k...@apache.org <mailto:k...@apache.org>>> wrote:
>
> Elaborating on what Robert alluded to: when I wrote that
runner
> author guide, portability was in its infancy. Now Beam
Python can be
> run on Flink. So that guide is primarily focused on the
"deserialize
> a Java DoFn and call its methods" approach. A decent
amount of it is
> still really important to know, but is now the
responsibility of the
> "SDK harness", aka language-specific coprocessor. For
Python & Go &
> <insert new SDK language here> you really want to use the
> portability protos and the portable Flink runner is the
best model.
>
> Kenn
>
>
> On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw
<rober...@google.com <mailto:rober...@google.com>
> <mailto:rober...@google.com
<mailto:rober...@google.com>>> wrote:
>
> On Fri, Feb 15, 2019 at 7:36 AM Can Gencer
<c...@hazelcast.com <mailto:c...@hazelcast.com>
> <mailto:c...@hazelcast.com
<mailto:c...@hazelcast.com>>> wrote:
> >
> > We at Hazelcast are looking into writing a Beam
runner for
> Hazelcast Jet
(https://github.com/hazelcast/hazelcast-jet). I
> wanted to introduce myself as we'll likely have
questions as we
> start development.
>
> Welcome!
>
> Hazelcast looks interesting, a Beam runner for it
would be very
> cool.
>
> > Some of the things I'm wondering about currently:
> >
> > * Currently there seems to be a guide available at
> https://beam.apache.org/contribute/runner-guide/ , is this up to
> date? Is there anything in specific to be aware of
when starting
> with a new runner that's not covered here?
>
> That looks like a pretty good starting point. At a
quick glance, I
> don't see anything that looks out of date. Another
resource that
> might
> be helpful is a talk from last year on writing an SDK
(but as it
> mostly covers the runner-sdk interaction, it's also
quite useful for
> understanding the runner side:
>
https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> And please feel free to ask any questions on this
list as well; we'd
> be happy to help.
>
> > * Should we be targeting the latest master which is at
> 2.12-SNAPSHOT or a stable version?
>
> I would target the latest master.
>
> > * After a runner is developed, how is the maintenance
> typically handled, as the runners seems to be part of
Beam codebase?
>
> Either is possible. Several runner adapters are part
of the Beam
> codebase, but for example the IMB Streams Beam runner
is not. There
> are certainly pros and cons (certainly early on when
the APIs
> themselves were under heavy development it was easier
to keep things
> in sync in the same codebase, but things have mostly
stabilized
> now).
> A runner only becomes part of the Beam codebase if
there are members
> of the community committed to maintaining it (which
could include
> you). Both approaches are fine.
>
> - Robert
>