Hello, One additional comment / question. I just noticed that Beam users already can write their Beam Pipelines and execute them in the google dataflow runner.
I just did the test today and I was thrilled to confirm that it worked (as JB told me). You can look at the SDK version in the image: https://imgur.com/k9HnLnv The question is, is this some kind of beta, or is this going to be supported during the transition (before the formal release 1.0) ? I ask this because I suppose many current google users hesitate to move to Beam for the moment because they don't know that they can already run their pipelines in the Google Cloud Dataflow service. I think this is a good idea to encourage users to move their data processing pipelines into the Beam version. Regards, Ismaël On Wed, Jun 15, 2016 at 11:21 PM, James Malone < [email protected]> wrote: > Hi everyone, > > This is a thread fork from the email thread titled '[dev] Announcing > 0.1.0-incubating release'. > > In that thread, Amir posed a good question: > > Why is still "Google Cloud Dataflow" included in the Beam release if > Beam is indeed > an evolution (super-set?) of "Google Cloud Dataflow".Thanks > +regards,Amir- > > Many parts of Apache Beam are based on work from Google Cloud Dataflow, > including the Dataflow (now Beam) model, SDKs (Java and Python), and some > of the runners. This work was combined with awesome contributions from > other groups (data Artisans/Apache Flink, Cloudera & PayPal/Apache Spark, > etc.) to form the basis for Apache Beam[1]. Originally, the Cloud Dataflow > SDK included machinery so Dataflow pipelines could be executed on Google > Cloud Dataflow. > > An important part of Apache Beam is the ability to execute Beam pipelines > on many runners (see the compatibility matrix[2] for full details and > support.) The Beam project includes a runner for Google Cloud Dataflow, > along with others, such as runners for Apache Flink and Apache Spark. We're > also focused (and excited!) to support and grow new runners. As a seperate > runner, the work for supporting execution on Cloud Dataflow can be > separated into the runner from the larger Apache Beam effort. > > So, to summarize: > > Beam is based on work from Google Cloud Dataflow so it's definitely an > evolution. Additionally, Beam includes a runner (one of many) for Google's > Cloud Dataflow service. > > Hope that helps! > > James > > [1]: http://wiki.apache.org/incubator/BeamProposal > [2]: http://beam.incubator.apache.org/capability-matrix >
