Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <[email protected]> wrote:
> Absolutely ;) > > > On 06/03/2016 03:51 PM, James Malone wrote: > >> Hey Silviu! >> >> I think JB is proposing we create a python directory in the sdks directory >> in the root repository (and modify the configuration files accordingly): >> >> https://github.com/apache/incubator-beam/tree/master/sdks >> >> This Beam document here titled "Apache Beam (Incubating): Repository >> Structure" details the proposed repository structure and may be useful: >> >> >> >> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc >> >> Best, >> >> James >> >> >> >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu >> <[email protected]> >> wrote: >> >> Hi JB, >>> Thanks for the welcome! I come from the Python land so I am not quite >>> familiar with Maven. What do you mean by a Maven module? You mean an >>> artifact so you can install things? In Python, people are used to >>> packages >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for >>> Python). Whatever is the standard way of doing things in Apache we'll do >>> it. Just asking for clarifications. >>> >>> By the way this discussion is very useful since we will have to iron out >>> several details like this. >>> Thanks, >>> Silviu >>> >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>> Hi Silviu, >>>> >>>> thanks for detailed update and great work ! >>>> >>>> I would advice to create a: >>>> >>>> sdks/python >>>> >>>> Maven module to store the Python SDK. >>>> >>>> WDYT ? >>>> >>>> By the way, welcome aboard and great to have you all guys in the team ! >>>> >>>> Regards >>>> JB >>>> >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: >>>> >>>> Hi all, >>>>> >>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team >>>>> working on the Python SDK. As the original Beam proposal ( >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have >>>>> been >>>>> planning to merge the Python SDK into Beam. The Python SDK is in an >>>>> >>>> early >>> >>>> stage of development (alpha milestone) and so this is a good time to >>>>> >>>> move >>> >>>> the code without causing too much disruption to our customers. >>>>> Additionally, this enables the Beam community to contribute as soon as >>>>> possible. >>>>> >>>>> The current state of the SDK is as follows: >>>>> >>>>> - >>>>> >>>>> Open-sourced at >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ >>>>> >>>>> >>>>> - >>>>> >>>>> Model: All main concepts are present. >>>>> - >>>>> >>>>> I/O: SDK supports text (Google Cloud Storage) and BigQuery >>>>> >>>> connectors >>> >>>> and has a framework for adding additional sources and sinks. >>>>> - >>>>> >>>>> Runners: SDK has two pipeline runners: direct runner (in process, >>>>> local >>>>> execution) and Cloud Dataflow runner for batch pipelines (submit >>>>> job >>>>> to >>>>> Google Dataflow service). The current direct runner is bounded >>>>> only >>>>> (batch >>>>> execution) but there is work in progress to support unbounded (as >>>>> in >>>>> Java). >>>>> - >>>>> >>>>> Testing: The code base has unit test coverage for all the modules >>>>> >>>> and >>> >>>> several integration and end to end tests (similar in coverage to >>>>> the >>>>> Java >>>>> SDK). Streaming is not well tested end to end yet since Cloud >>>>> >>>> Dataflow >>> >>>> focused first on batch. >>>>> - >>>>> >>>>> Docs: We have matching Python documentation for the features >>>>> >>>> currently >>> >>>> supported by Cloud Dataflow. The docs are on cloud.google.com >>>>> >>>> (access >>> >>>> only by whitelist due to the alpha stage of the project). Devin is >>>>> working >>>>> on the transition of all docs to Apache. >>>>> >>>>> >>>>> In the next days/weeks we would like to prepare and start migrating the >>>>> code and you should start seeing some pull requests. We also hope that >>>>> >>>> the >>> >>>> Beam community will shape the SDK going forward. In particular, all the >>>>> model improvements implemented for Java (Runner API, etc.) will have >>>>> equivalents in Python once they stabilize. If you have any advice >>>>> before >>>>> we >>>>> start the journey please let us know. >>>>> >>>>> The team that will join the Beam effort consists of me (Silviu >>>>> >>>> Calinoiu), >>> >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least >>>>> Robert Bradshaw (who is already an Apache Beam committer). >>>>> >>>>> So let us know what you think! >>>>> >>>>> Best regards, >>>>> >>>>> Silviu >>>>> >>>>> >>>>> -- >>>> Jean-Baptiste Onofré >>>> [email protected] >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
