Re: Apache Beam for Python

Jean-Baptiste Onofré Fri, 03 Jun 2016 07:08:13 -0700

Absolutely ;)

On 06/03/2016 03:51 PM, James Malone wrote:

Hey Silviu!


I think JB is proposing we create a python directory in the sdks directory
in the root repository (and modify the configuration files accordingly):

    https://github.com/apache/incubator-beam/tree/master/sdks

This Beam document here titled "Apache Beam (Incubating): Repository
Structure" details the proposed repository structure and may be useful:


https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc

Best,

James



On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu <[email protected]>
wrote:

Hi JB,
Thanks for the welcome! I come from the Python land so  I am not quite
familiar with Maven. What do you mean by a Maven module? You mean an
artifact so you can install things? In Python, people are used to packages
downloaded from PyPI (pypi.python.org -- which is sort of Maven for
Python). Whatever is the standard way of doing things in Apache we'll do
it. Just asking for clarifications.

By the way this discussion is very useful since we will have to iron out
several details like this.
Thanks,
Silviu

On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

Hi Silviu,

thanks for detailed update and great work !

I would advice to create a:

sdks/python

Maven module to store the Python SDK.

WDYT ?

By the way, welcome aboard and great to have you all guys in the team !

Regards
JB

On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:

Hi all,

My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
working on the Python SDK.  As the original Beam proposal (
https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
planning to merge the Python SDK into Beam. The Python SDK is in an

early

stage of development (alpha milestone) and so this is a good time to

move

the code without causing too much disruption to our customers.
Additionally, this enables the Beam community to contribute as soon as
possible.

The current state of the SDK is as follows:

     -

     Open-sourced at
https://github.com/GoogleCloudPlatform/DataflowPythonSDK/


     -

     Model: All main concepts are present.
     -

     I/O: SDK supports text (Google Cloud Storage) and BigQuery

connectors

     and has a framework for adding additional sources and sinks.
     -

     Runners: SDK has two pipeline runners: direct runner (in process,
local
     execution) and Cloud Dataflow runner for batch pipelines (submit job
to
     Google Dataflow service). The current direct runner is bounded only
(batch
     execution) but there is work in progress to support unbounded (as in
Java).
     -

     Testing: The code base has unit test coverage for all the modules

and

     several integration and end to end tests (similar in coverage to the
Java
     SDK). Streaming is not well tested end to end yet since Cloud

Dataflow

     focused first on batch.
     -

     Docs: We have matching Python documentation for the features

currently

     supported by Cloud Dataflow. The docs are on cloud.google.com

(access

     only by whitelist due to the alpha stage of the project). Devin is
working
     on the transition of all docs to Apache.


In the next days/weeks we would like to prepare and start migrating the
code and you should start seeing some pull requests. We also hope that

the

Beam community will shape the SDK going forward. In particular, all the
model improvements implemented for Java (Runner API, etc.) will have
equivalents in Python once they stabilize. If you have any advice before
we
start the journey please let us know.

The team that will join the Beam effort consists of me (Silviu

Calinoiu),

Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
Robert Bradshaw (who is already an Apache Beam committer).

So let us know what you think!

Best regards,

Silviu

--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Apache Beam for Python

Reply via email to