Re: [thread fork] Apache Beam & Google Cloud Dataflow

Frances Perry Thu, 16 Jun 2016 22:22:16 -0700

With my Google employee hat on, I'd like to soften that claim a little ;-)

Currently, the Beam SDK runs again Google Cloud Dataflow. But since Beam
isn't itself ready for prime time yet, Google doesn't officially provide
support for running Beam on Cloud Dataflow right now, and Google Cloud
Dataflow customers should still use the original Dataflow Java SDK.


But I, for one, am looking forward to this evolving over the next few
months as Beam stabilizes ;-D


On Thu, Jun 16, 2016 at 9:50 PM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi,
>
> as soon as you use the Beam dataflow runner, it should work smoothly.
>
> Regards
> JB
>
>
> On 06/16/2016 10:05 PM, Ismaël Mejía wrote:
>
>> Hello,
>>
>> One additional comment / question. I just noticed that Beam users already
>> can write their Beam Pipelines and execute them in the google dataflow
>> runner.
>>
>> I just did the test today and I was thrilled to confirm that it worked (as
>> JB told me).
>>
>> You can look at the SDK version in the image:
>> https://imgur.com/k9HnLnv
>>
>> The question is, is this some kind of beta, or is this going to be
>> supported during the transition (before the formal release 1.0) ? I ask
>> this because I suppose many current google users hesitate to move to Beam
>> for the moment because they don't know that they can already run their
>> pipelines in the Google Cloud Dataflow service. I think this is a good
>> idea
>> to encourage users to move their data processing pipelines into the Beam
>> version.
>>
>> Regards,
>> Ismaël
>>
>>
>>
>>
>> On Wed, Jun 15, 2016 at 11:21 PM, James Malone <
>> [email protected]> wrote:
>>
>> Hi everyone,
>>>
>>> This is a thread fork from the email thread titled '[dev] Announcing
>>> 0.1.0-incubating release'.
>>>
>>> In that thread, Amir posed a good question:
>>>
>>>     Why is still "Google Cloud Dataflow" included in the Beam release if
>>> Beam is indeed
>>>     an evolution (super-set?) of "Google Cloud Dataflow".Thanks
>>> +regards,Amir-
>>>
>>> Many parts of Apache Beam are based on work from Google Cloud Dataflow,
>>> including the Dataflow (now Beam) model, SDKs (Java and Python), and some
>>> of the runners. This work was combined with awesome contributions from
>>> other groups (data Artisans/Apache Flink, Cloudera & PayPal/Apache Spark,
>>> etc.) to form the basis for Apache Beam[1]. Originally, the Cloud
>>> Dataflow
>>> SDK included machinery so Dataflow pipelines could be executed on Google
>>> Cloud Dataflow.
>>>
>>> An important part of Apache Beam is the ability to execute Beam pipelines
>>> on many runners (see the compatibility matrix[2] for full details and
>>> support.) The Beam project includes a runner for Google Cloud Dataflow,
>>> along with others, such as runners for Apache Flink and Apache Spark.
>>> We're
>>> also focused (and excited!) to support and grow new runners. As a
>>> seperate
>>> runner, the work for supporting execution on Cloud Dataflow can be
>>> separated into the runner from the larger Apache Beam effort.
>>>
>>> So, to summarize:
>>>
>>> Beam is based on work from Google Cloud Dataflow so it's definitely an
>>> evolution. Additionally, Beam includes a runner (one of many) for
>>> Google's
>>> Cloud Dataflow service.
>>>
>>> Hope that helps!
>>>
>>> James
>>>
>>> [1]: http://wiki.apache.org/incubator/BeamProposal
>>> [2]: http://beam.incubator.apache.org/capability-matrix
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [thread fork] Apache Beam & Google Cloud Dataflow

Reply via email to