[thread fork] Apache Beam & Google Cloud Dataflow

James Malone Wed, 15 Jun 2016 14:22:47 -0700

Hi everyone,

This is a thread fork from the email thread titled '[dev] Announcing
0.1.0-incubating release'.


In that thread, Amir posed a good question:

   Why is still "Google Cloud Dataflow" included in the Beam release if
Beam is indeed
   an evolution (super-set?) of "Google Cloud Dataflow".Thanks
+regards,Amir-

Many parts of Apache Beam are based on work from Google Cloud Dataflow,
including the Dataflow (now Beam) model, SDKs (Java and Python), and some
of the runners. This work was combined with awesome contributions from
other groups (data Artisans/Apache Flink, Cloudera & PayPal/Apache Spark,
etc.) to form the basis for Apache Beam[1]. Originally, the Cloud Dataflow
SDK included machinery so Dataflow pipelines could be executed on Google
Cloud Dataflow.

An important part of Apache Beam is the ability to execute Beam pipelines
on many runners (see the compatibility matrix[2] for full details and
support.) The Beam project includes a runner for Google Cloud Dataflow,
along with others, such as runners for Apache Flink and Apache Spark. We're
also focused (and excited!) to support and grow new runners. As a seperate
runner, the work for supporting execution on Cloud Dataflow can be
separated into the runner from the larger Apache Beam effort.

So, to summarize:

Beam is based on work from Google Cloud Dataflow so it's definitely an
evolution. Additionally, Beam includes a runner (one of many) for Google's
Cloud Dataflow service.

Hope that helps!

James

[1]: http://wiki.apache.org/incubator/BeamProposal
[2]: http://beam.incubator.apache.org/capability-matrix

[thread fork] Apache Beam & Google Cloud Dataflow

Reply via email to