Sounds right. Is there any papers about how Cloud Dataflow work on optimization? Spark, as far as i known, will not change the execution order of your transforms but rely on lazy operation and DAG scheduler pipeline.
2016-02-19 13:20 GMT+08:00 Frances Perry <[email protected]>: > (I'm not familiar with the details of Catalyst itself.) > > The existing runners (Cloud Dataflow, Spark, Flink) all do optimizations of > their own, though it's quite likely there's a set of optimizations that are > conceptually shared. For example, something like ParDo fusion is pretty > basic to executing the Beam model. However, even that could be tuned very > differently depending on the backend you are targeting. So I don't think we > should have a shared optimizer for all of Beam. However, if there's a set > of graph transformations that are useful to multiple runners, it'd be great > to have them written in a general way and put in some kind of runner util > package. > > Frances > > On Thu, Feb 18, 2016 at 6:37 PM, lonely Feb <[email protected]> wrote: > > > Should we have a common optimization framework for BEAM which just same > as > > Spark Catalyst? Optimization is so significant but it seems that we have > no > > plans for it? > > >
