Sounds right. Is there any papers about how Cloud Dataflow work on
optimization? Spark, as far as i known, will not change the execution order
of your transforms but rely on lazy operation and DAG scheduler pipeline.

2016-02-19 13:20 GMT+08:00 Frances Perry <[email protected]>:

> (I'm not familiar with the details of Catalyst itself.)
>
> The existing runners (Cloud Dataflow, Spark, Flink) all do optimizations of
> their own, though it's quite likely there's a set of optimizations that are
> conceptually shared. For example, something like ParDo fusion is pretty
> basic to executing the Beam model. However, even that could be tuned very
> differently depending on the backend you are targeting. So I don't think we
> should have a shared optimizer for all of Beam. However, if there's a set
> of graph transformations that are useful to multiple runners, it'd be great
> to have them written in a general way and put in some kind of runner util
> package.
>
> Frances
>
> On Thu, Feb 18, 2016 at 6:37 PM, lonely Feb <[email protected]> wrote:
>
> > Should we have a common optimization framework for BEAM which just same
> as
> > Spark Catalyst? Optimization is so significant but it seems that we have
> no
> > plans for it?
> >
>

Reply via email to