Welcome Python people ;) I know a few people who've been waiting for this one!
On Fri, Jun 3, 2016, 19:53 Davor Bonaci <[email protected]> wrote: > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <[email protected]> > wrote: > > > Absolutely ;) > > > > > > On 06/03/2016 03:51 PM, James Malone wrote: > > > >> Hey Silviu! > >> > >> I think JB is proposing we create a python directory in the sdks > directory > >> in the root repository (and modify the configuration files accordingly): > >> > >> https://github.com/apache/incubator-beam/tree/master/sdks > >> > >> This Beam document here titled "Apache Beam (Incubating): Repository > >> Structure" details the proposed repository structure and may be useful: > >> > >> > >> > >> > https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc > >> > >> Best, > >> > >> James > >> > >> > >> > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu > >> <[email protected]> > >> wrote: > >> > >> Hi JB, > >>> Thanks for the welcome! I come from the Python land so I am not quite > >>> familiar with Maven. What do you mean by a Maven module? You mean an > >>> artifact so you can install things? In Python, people are used to > >>> packages > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for > >>> Python). Whatever is the standard way of doing things in Apache we'll > do > >>> it. Just asking for clarifications. > >>> > >>> By the way this discussion is very useful since we will have to iron > out > >>> several details like this. > >>> Thanks, > >>> Silviu > >>> > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <[email protected]> > >>> wrote: > >>> > >>> Hi Silviu, > >>>> > >>>> thanks for detailed update and great work ! > >>>> > >>>> I would advice to create a: > >>>> > >>>> sdks/python > >>>> > >>>> Maven module to store the Python SDK. > >>>> > >>>> WDYT ? > >>>> > >>>> By the way, welcome aboard and great to have you all guys in the team > ! > >>>> > >>>> Regards > >>>> JB > >>>> > >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: > >>>> > >>>> Hi all, > >>>>> > >>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow > team > >>>>> working on the Python SDK. As the original Beam proposal ( > >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have > >>>>> been > >>>>> planning to merge the Python SDK into Beam. The Python SDK is in an > >>>>> > >>>> early > >>> > >>>> stage of development (alpha milestone) and so this is a good time to > >>>>> > >>>> move > >>> > >>>> the code without causing too much disruption to our customers. > >>>>> Additionally, this enables the Beam community to contribute as soon > as > >>>>> possible. > >>>>> > >>>>> The current state of the SDK is as follows: > >>>>> > >>>>> - > >>>>> > >>>>> Open-sourced at > >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ > >>>>> > >>>>> > >>>>> - > >>>>> > >>>>> Model: All main concepts are present. > >>>>> - > >>>>> > >>>>> I/O: SDK supports text (Google Cloud Storage) and BigQuery > >>>>> > >>>> connectors > >>> > >>>> and has a framework for adding additional sources and sinks. > >>>>> - > >>>>> > >>>>> Runners: SDK has two pipeline runners: direct runner (in > process, > >>>>> local > >>>>> execution) and Cloud Dataflow runner for batch pipelines (submit > >>>>> job > >>>>> to > >>>>> Google Dataflow service). The current direct runner is bounded > >>>>> only > >>>>> (batch > >>>>> execution) but there is work in progress to support unbounded > (as > >>>>> in > >>>>> Java). > >>>>> - > >>>>> > >>>>> Testing: The code base has unit test coverage for all the > modules > >>>>> > >>>> and > >>> > >>>> several integration and end to end tests (similar in coverage to > >>>>> the > >>>>> Java > >>>>> SDK). Streaming is not well tested end to end yet since Cloud > >>>>> > >>>> Dataflow > >>> > >>>> focused first on batch. > >>>>> - > >>>>> > >>>>> Docs: We have matching Python documentation for the features > >>>>> > >>>> currently > >>> > >>>> supported by Cloud Dataflow. The docs are on cloud.google.com > >>>>> > >>>> (access > >>> > >>>> only by whitelist due to the alpha stage of the project). Devin > is > >>>>> working > >>>>> on the transition of all docs to Apache. > >>>>> > >>>>> > >>>>> In the next days/weeks we would like to prepare and start migrating > the > >>>>> code and you should start seeing some pull requests. We also hope > that > >>>>> > >>>> the > >>> > >>>> Beam community will shape the SDK going forward. In particular, all > the > >>>>> model improvements implemented for Java (Runner API, etc.) will have > >>>>> equivalents in Python once they stabilize. If you have any advice > >>>>> before > >>>>> we > >>>>> start the journey please let us know. > >>>>> > >>>>> The team that will join the Beam effort consists of me (Silviu > >>>>> > >>>> Calinoiu), > >>> > >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least > >>>>> Robert Bradshaw (who is already an Apache Beam committer). > >>>>> > >>>>> So let us know what you think! > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Silviu > >>>>> > >>>>> > >>>>> -- > >>>> Jean-Baptiste Onofré > >>>> [email protected] > >>>> http://blog.nanthrax.net > >>>> Talend - http://www.talend.com > >>>> > >>>> > >>> > >> > > -- > > Jean-Baptiste Onofré > > [email protected] > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >
