I had a look at "GreedyPipelineFuser" and indeed this was what exactly I was talking about.
Is https://beam.apache.org/roadmap/portability/ still the best information about the portable runners or is there a more in-depth guide available anywhere? On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <c...@hazelcast.com> wrote: > Hi Max, > > Thanks. When you mean "old-style runner" is this meant that this style of > runners will be obsolete and only the portable one will be supported? The > documentation for portable runners wasn't quite complete and the barrier to > entry for writing an old style runner seemed easier for us and the old > style runner should have better performance? > > On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <m...@apache.org> wrote: > >> Hi Can, >> >> Thanks for the update. Interesting question. Flink has an optimization >> built in called chaining which works together nicely with Beam. >> Essentially, operators which share the same partitioning get executed >> one after another inside a master operator. This saves resources. >> >> Interestingly, Beam's Fuser for portable Runners does something similar. >> AFAIK there is no built-in solution for the old-style Runners. I think >> it would be possible to build something like this on top of the existing >> translation. >> >> Cheers, >> Max >> >> On 20.03.19 13:07, Can Gencer wrote: >> > Hi again, >> > >> > We've made some progress on the runner since writing this more than a >> > month ago, the repo is available here publicly: >> > https://github.com/hazelcast/hazelcast-jet-beam-runner >> > >> > Still very much a work in progress though. One of the issues I wanted >> to >> > raise is that currently we're translating each PTransform to a Jet >> > Vertex (could be consider analogous to a Flink operator or a vertex in >> > Tez). This is sub-optimal, since Beam creates lots of transforms for >> > computations that could be performed inside the same Vertex, such as >> > subsequent mapping transforms and many others. Ideally you only need >> > distinct vertices where the data is re-partitioned and/or shuffled. I'm >> > curious if Beam offers some way of translating the PTransform graph to >> a >> > more minimal set of transforms, i.e. some kind of planner or would this >> > have to be custom code? We've done a similar integration with Cascading >> > in the past and it offered a planner which given a set of rules would >> > partition the Cascading DAG into a minimal set of vertices for the same >> > DAG. Curious if Beam has any similar functionality? >> > >> > >> > >> > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles <k...@apache.org >> > <mailto:k...@apache.org>> wrote: >> > >> > Elaborating on what Robert alluded to: when I wrote that runner >> > author guide, portability was in its infancy. Now Beam Python can be >> > run on Flink. So that guide is primarily focused on the "deserialize >> > a Java DoFn and call its methods" approach. A decent amount of it is >> > still really important to know, but is now the responsibility of the >> > "SDK harness", aka language-specific coprocessor. For Python & Go & >> > <insert new SDK language here> you really want to use the >> > portability protos and the portable Flink runner is the best model. >> > >> > Kenn >> > >> > >> > On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw < >> rober...@google.com >> > <mailto:rober...@google.com>> wrote: >> > >> > On Fri, Feb 15, 2019 at 7:36 AM Can Gencer <c...@hazelcast.com >> > <mailto:c...@hazelcast.com>> wrote: >> > > >> > > We at Hazelcast are looking into writing a Beam runner for >> > Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I >> > wanted to introduce myself as we'll likely have questions as we >> > start development. >> > >> > Welcome! >> > >> > Hazelcast looks interesting, a Beam runner for it would be very >> > cool. >> > >> > > Some of the things I'm wondering about currently: >> > > >> > > * Currently there seems to be a guide available at >> > https://beam.apache.org/contribute/runner-guide/ , is this up >> to >> > date? Is there anything in specific to be aware of when starting >> > with a new runner that's not covered here? >> > >> > That looks like a pretty good starting point. At a quick >> glance, I >> > don't see anything that looks out of date. Another resource that >> > might >> > be helpful is a talk from last year on writing an SDK (but as it >> > mostly covers the runner-sdk interaction, it's also quite >> useful for >> > understanding the runner side: >> > >> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p >> > And please feel free to ask any questions on this list as well; >> we'd >> > be happy to help. >> > >> > > * Should we be targeting the latest master which is at >> > 2.12-SNAPSHOT or a stable version? >> > >> > I would target the latest master. >> > >> > > * After a runner is developed, how is the maintenance >> > typically handled, as the runners seems to be part of Beam >> codebase? >> > >> > Either is possible. Several runner adapters are part of the Beam >> > codebase, but for example the IMB Streams Beam runner is not. >> There >> > are certainly pros and cons (certainly early on when the APIs >> > themselves were under heavy development it was easier to keep >> things >> > in sync in the same codebase, but things have mostly stabilized >> > now). >> > A runner only becomes part of the Beam codebase if there are >> members >> > of the community committed to maintaining it (which could >> include >> > you). Both approaches are fine. >> > >> > - Robert >> > >> >