Re: [thread fork] Apache Beam & Google Cloud Dataflow

Ismaël Mejía Thu, 16 Jun 2016 13:06:33 -0700

Hello,

One additional comment / question. I just noticed that Beam users already
can write their Beam Pipelines and execute them in the google dataflow
runner.


I just did the test today and I was thrilled to confirm that it worked (as
JB told me).

You can look at the SDK version in the image:
https://imgur.com/k9HnLnv

The question is, is this some kind of beta, or is this going to be
supported during the transition (before the formal release 1.0) ? I ask
this because I suppose many current google users hesitate to move to Beam
for the moment because they don't know that they can already run their
pipelines in the Google Cloud Dataflow service. I think this is a good idea
to encourage users to move their data processing pipelines into the Beam
version.

Regards,
Ismaël




On Wed, Jun 15, 2016 at 11:21 PM, James Malone <
[email protected]> wrote:

> Hi everyone,
>
> This is a thread fork from the email thread titled '[dev] Announcing
> 0.1.0-incubating release'.
>
> In that thread, Amir posed a good question:
>
>    Why is still "Google Cloud Dataflow" included in the Beam release if
> Beam is indeed
>    an evolution (super-set?) of "Google Cloud Dataflow".Thanks
> +regards,Amir-
>
> Many parts of Apache Beam are based on work from Google Cloud Dataflow,
> including the Dataflow (now Beam) model, SDKs (Java and Python), and some
> of the runners. This work was combined with awesome contributions from
> other groups (data Artisans/Apache Flink, Cloudera & PayPal/Apache Spark,
> etc.) to form the basis for Apache Beam[1]. Originally, the Cloud Dataflow
> SDK included machinery so Dataflow pipelines could be executed on Google
> Cloud Dataflow.
>
> An important part of Apache Beam is the ability to execute Beam pipelines
> on many runners (see the compatibility matrix[2] for full details and
> support.) The Beam project includes a runner for Google Cloud Dataflow,
> along with others, such as runners for Apache Flink and Apache Spark. We're
> also focused (and excited!) to support and grow new runners. As a seperate
> runner, the work for supporting execution on Cloud Dataflow can be
> separated into the runner from the larger Apache Beam effort.
>
> So, to summarize:
>
> Beam is based on work from Google Cloud Dataflow so it's definitely an
> evolution. Additionally, Beam includes a runner (one of many) for Google's
> Cloud Dataflow service.
>
> Hope that helps!
>
> James
>
> [1]: http://wiki.apache.org/incubator/BeamProposal
> [2]: http://beam.incubator.apache.org/capability-matrix
>

Re: [thread fork] Apache Beam & Google Cloud Dataflow

Reply via email to