Hi all, My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team working on the Python SDK. As the original Beam proposal ( https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been planning to merge the Python SDK into Beam. The Python SDK is in an early stage of development (alpha milestone) and so this is a good time to move the code without causing too much disruption to our customers. Additionally, this enables the Beam community to contribute as soon as possible.
The current state of the SDK is as follows: - Open-sourced at https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ - Model: All main concepts are present. - I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors and has a framework for adding additional sources and sinks. - Runners: SDK has two pipeline runners: direct runner (in process, local execution) and Cloud Dataflow runner for batch pipelines (submit job to Google Dataflow service). The current direct runner is bounded only (batch execution) but there is work in progress to support unbounded (as in Java). - Testing: The code base has unit test coverage for all the modules and several integration and end to end tests (similar in coverage to the Java SDK). Streaming is not well tested end to end yet since Cloud Dataflow focused first on batch. - Docs: We have matching Python documentation for the features currently supported by Cloud Dataflow. The docs are on cloud.google.com (access only by whitelist due to the alpha stage of the project). Devin is working on the transition of all docs to Apache. In the next days/weeks we would like to prepare and start migrating the code and you should start seeing some pull requests. We also hope that the Beam community will shape the SDK going forward. In particular, all the model improvements implemented for Java (Runner API, etc.) will have equivalents in Python once they stabilize. If you have any advice before we start the journey please let us know. The team that will join the Beam effort consists of me (Silviu Calinoiu), Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least Robert Bradshaw (who is already an Apache Beam committer). So let us know what you think! Best regards, Silviu
