I agree with Frances, optimizations are mostly runner-specific, and Beam
will probably provide a library of common (well optimized) graph
transformations and composite transforms.
We could however provide some sort of pipeline "reviewer" by implementing a
PipelineVisitor that reviews the graph of transformation that is about to
be executed. For example: if my graph uses GroupByKey, I'll be notified
that Combine.perKey could be more efficient (which is right in most cases,
though not all).

Amit

On Fri, Feb 19, 2016 at 9:37 AM lonely Feb <[email protected]> wrote:

> Sounds right. Is there any papers about how Cloud Dataflow work on
> optimization? Spark, as far as i known, will not change the execution order
> of your transforms but rely on lazy operation and DAG scheduler pipeline.
>
> 2016-02-19 13:20 GMT+08:00 Frances Perry <[email protected]>:
>
> > (I'm not familiar with the details of Catalyst itself.)
> >
> > The existing runners (Cloud Dataflow, Spark, Flink) all do optimizations
> of
> > their own, though it's quite likely there's a set of optimizations that
> are
> > conceptually shared. For example, something like ParDo fusion is pretty
> > basic to executing the Beam model. However, even that could be tuned very
> > differently depending on the backend you are targeting. So I don't think
> we
> > should have a shared optimizer for all of Beam. However, if there's a set
> > of graph transformations that are useful to multiple runners, it'd be
> great
> > to have them written in a general way and put in some kind of runner util
> > package.
> >
> > Frances
> >
> > On Thu, Feb 18, 2016 at 6:37 PM, lonely Feb <[email protected]>
> wrote:
> >
> > > Should we have a common optimization framework for BEAM which just same
> > as
> > > Spark Catalyst? Optimization is so significant but it seems that we
> have
> > no
> > > plans for it?
> > >
> >
>

Reply via email to