Hi all,

My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
working on the Python SDK.  As the original Beam proposal (
https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
planning to merge the Python SDK into Beam. The Python SDK is in an early
stage of development (alpha milestone) and so this is a good time to move
the code without causing too much disruption to our customers.
Additionally, this enables the Beam community to contribute as soon as
possible.

The current state of the SDK is as follows:

   -

   Open-sourced at https://github.com/GoogleCloudPlatform/DataflowPythonSDK/


   -

   Model: All main concepts are present.
   -

   I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors
   and has a framework for adding additional sources and sinks.
   -

   Runners: SDK has two pipeline runners: direct runner (in process, local
   execution) and Cloud Dataflow runner for batch pipelines (submit job to
   Google Dataflow service). The current direct runner is bounded only (batch
   execution) but there is work in progress to support unbounded (as in Java).
   -

   Testing: The code base has unit test coverage for all the modules and
   several integration and end to end tests (similar in coverage to the Java
   SDK). Streaming is not well tested end to end yet since Cloud Dataflow
   focused first on batch.
   -

   Docs: We have matching Python documentation for the features currently
   supported by Cloud Dataflow. The docs are on cloud.google.com (access
   only by whitelist due to the alpha stage of the project). Devin is working
   on the transition of all docs to Apache.


In the next days/weeks we would like to prepare and start migrating the
code and you should start seeing some pull requests. We also hope that the
Beam community will shape the SDK going forward. In particular, all the
model improvements implemented for Java (Runner API, etc.) will have
equivalents in Python once they stabilize. If you have any advice before we
start the journey please let us know.

The team that will join the Beam effort consists of me (Silviu Calinoiu),
Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
Robert Bradshaw (who is already an Apache Beam committer).

So let us know what you think!

Best regards,

Silviu

Reply via email to