Woo hoo!
On Tue, Jun 14, 2016 at 12:41 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Awesome ! Thanks ! > > Agree with Davor to create a feature branch. > > Regards > JB > > > On 06/14/2016 09:22 PM, Silviu Calinoiu wrote: >> >> Thanks everybody for the welcoming and feedback. The initial code move was >> proposed as pull request #461 [1]. >> >> Looking forward to working with everybody in the Beam community and >> especially any Pythonistas out there. >> >> Thanks, >> Silviu >> >> [1] https://github.com/apache/incubator-beam/pull/461 >> >> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía <[email protected]> wrote: >> >>> Excellent guys, Welcome to Beam ! >>> >>> I am looking for ways to integrate Beam with the standard notebook tools >>> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK >>> arriving to Beam, Awesome. >>> >>> Ismaël Mejía >>> >>> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <[email protected]> wrote: >>> >>>> Welcome Python people ;) >>>> >>>> I know a few people who've been waiting for this one! >>>> >>>> On Fri, Jun 3, 2016, 19:53 Davor Bonaci <[email protected]> >>> >>> wrote: >>>> >>>> >>>>> Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! >>>>> >>>>> On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <[email protected]> >>>>> wrote: >>>>> >>>>>> Absolutely ;) >>>>>> >>>>>> >>>>>> On 06/03/2016 03:51 PM, James Malone wrote: >>>>>> >>>>>>> Hey Silviu! >>>>>>> >>>>>>> I think JB is proposing we create a python directory in the sdks >>>>> >>>>> directory >>>>>>> >>>>>>> in the root repository (and modify the configuration files >>>> >>>> accordingly): >>>>>>> >>>>>>> >>>>>>> https://github.com/apache/incubator-beam/tree/master/sdks >>>>>>> >>>>>>> This Beam document here titled "Apache Beam (Incubating): Repository >>>>>>> Structure" details the proposed repository structure and may be >>>> >>>> useful: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> James >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi JB, >>>>>>>> >>>>>>>> Thanks for the welcome! I come from the Python land so I am not >>>> >>>> quite >>>>>>>> >>>>>>>> familiar with Maven. What do you mean by a Maven module? You mean >>> >>> an >>>>>>>> >>>>>>>> artifact so you can install things? In Python, people are used to >>>>>>>> packages >>>>>>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven >>> >>> for >>>>>>>> >>>>>>>> Python). Whatever is the standard way of doing things in Apache >>> >>> we'll >>>>> >>>>> do >>>>>>>> >>>>>>>> it. Just asking for clarifications. >>>>>>>> >>>>>>>> By the way this discussion is very useful since we will have to >>> >>> iron >>>>> >>>>> out >>>>>>>> >>>>>>>> several details like this. >>>>>>>> Thanks, >>>>>>>> Silviu >>>>>>>> >>>>>>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < >>>> >>>> [email protected]> >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Silviu, >>>>>>>>> >>>>>>>>> >>>>>>>>> thanks for detailed update and great work ! >>>>>>>>> >>>>>>>>> I would advice to create a: >>>>>>>>> >>>>>>>>> sdks/python >>>>>>>>> >>>>>>>>> Maven module to store the Python SDK. >>>>>>>>> >>>>>>>>> WDYT ? >>>>>>>>> >>>>>>>>> By the way, welcome aboard and great to have you all guys in the >>>> >>>> team >>>>> >>>>> ! >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> JB >>>>>>>>> >>>>>>>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My name is Silviu Calinoiu and I am a member of the Cloud >>> >>> Dataflow >>>>> >>>>> team >>>>>>>>>> >>>>>>>>>> working on the Python SDK. As the original Beam proposal ( >>>>>>>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we >>> >>> have >>>>>>>>>> >>>>>>>>>> been >>>>>>>>>> planning to merge the Python SDK into Beam. The Python SDK is in >>> >>> an >>>>>>>>>> >>>>>>>>>> >>>>>>>>> early >>>>>>>> >>>>>>>> >>>>>>>>> stage of development (alpha milestone) and so this is a good time >>> >>> to >>>>>>>>>> >>>>>>>>>> >>>>>>>>> move >>>>>>>> >>>>>>>> >>>>>>>>> the code without causing too much disruption to our customers. >>>>>>>>>> >>>>>>>>>> Additionally, this enables the Beam community to contribute as >>> >>> soon >>>>> >>>>> as >>>>>>>>>> >>>>>>>>>> possible. >>>>>>>>>> >>>>>>>>>> The current state of the SDK is as follows: >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Open-sourced at >>>>>>>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Model: All main concepts are present. >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> I/O: SDK supports text (Google Cloud Storage) and BigQuery >>>>>>>>>> >>>>>>>>> connectors >>>>>>>> >>>>>>>> >>>>>>>>> and has a framework for adding additional sources and sinks. >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Runners: SDK has two pipeline runners: direct runner (in >>>>> >>>>> process, >>>>>>>>>> >>>>>>>>>> local >>>>>>>>>> execution) and Cloud Dataflow runner for batch pipelines >>>> >>>> (submit >>>>>>>>>> >>>>>>>>>> job >>>>>>>>>> to >>>>>>>>>> Google Dataflow service). The current direct runner is >>> >>> bounded >>>>>>>>>> >>>>>>>>>> only >>>>>>>>>> (batch >>>>>>>>>> execution) but there is work in progress to support >>> >>> unbounded >>>>> >>>>> (as >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> Java). >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Testing: The code base has unit test coverage for all the >>>>> >>>>> modules >>>>>>>>>> >>>>>>>>>> >>>>>>>>> and >>>>>>>> >>>>>>>> >>>>>>>>> several integration and end to end tests (similar in coverage >>>> >>>> to >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> Java >>>>>>>>>> SDK). Streaming is not well tested end to end yet since >>> >>> Cloud >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Dataflow >>>>>>>> >>>>>>>> >>>>>>>>> focused first on batch. >>>>>>>>>> >>>>>>>>>> - >>>>>>>>>> >>>>>>>>>> Docs: We have matching Python documentation for the features >>>>>>>>>> >>>>>>>>> currently >>>>>>>> >>>>>>>> >>>>>>>>> supported by Cloud Dataflow. The docs are on >>> >>> cloud.google.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>> (access >>>>>>>> >>>>>>>> >>>>>>>>> only by whitelist due to the alpha stage of the project). >>> >>> Devin >>>>> >>>>> is >>>>>>>>>> >>>>>>>>>> working >>>>>>>>>> on the transition of all docs to Apache. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the next days/weeks we would like to prepare and start >>> >>> migrating >>>>> >>>>> the >>>>>>>>>> >>>>>>>>>> code and you should start seeing some pull requests. We also hope >>>>> >>>>> that >>>>>>>>>> >>>>>>>>>> >>>>>>>>> the >>>>>>>> >>>>>>>> >>>>>>>>> Beam community will shape the SDK going forward. In particular, >>> >>> all >>>>> >>>>> the >>>>>>>>>> >>>>>>>>>> model improvements implemented for Java (Runner API, etc.) will >>>> >>>> have >>>>>>>>>> >>>>>>>>>> equivalents in Python once they stabilize. If you have any advice >>>>>>>>>> before >>>>>>>>>> we >>>>>>>>>> start the journey please let us know. >>>>>>>>>> >>>>>>>>>> The team that will join the Beam effort consists of me (Silviu >>>>>>>>>> >>>>>>>>> Calinoiu), >>>>>>>> >>>>>>>> >>>>>>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not >>>> >>>> least >>>>>>>>>> >>>>>>>>>> Robert Bradshaw (who is already an Apache Beam committer). >>>>>>>>>> >>>>>>>>>> So let us know what you think! >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Silviu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> >>>>>>>>> Jean-Baptiste Onofré >>>>>>>>> [email protected] >>>>>>>>> http://blog.nanthrax.net >>>>>>>>> Talend - http://www.talend.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> Jean-Baptiste Onofré >>>>>> [email protected] >>>>>> http://blog.nanthrax.net >>>>>> Talend - http://www.talend.com >>>>>> >>>>> >>>> >>> >> > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com
