Excellent guys, Welcome to Beam ! I am looking for ways to integrate Beam with the standard notebook tools (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK arriving to Beam, Awesome.
Ismaël Mejía On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <[email protected]> wrote: > Welcome Python people ;) > > I know a few people who've been waiting for this one! > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci <[email protected]> wrote: > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <[email protected]> > > wrote: > > > > > Absolutely ;) > > > > > > > > > On 06/03/2016 03:51 PM, James Malone wrote: > > > > > >> Hey Silviu! > > >> > > >> I think JB is proposing we create a python directory in the sdks > > directory > > >> in the root repository (and modify the configuration files > accordingly): > > >> > > >> https://github.com/apache/incubator-beam/tree/master/sdks > > >> > > >> This Beam document here titled "Apache Beam (Incubating): Repository > > >> Structure" details the proposed repository structure and may be > useful: > > >> > > >> > > >> > > >> > > > https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc > > >> > > >> Best, > > >> > > >> James > > >> > > >> > > >> > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu > > >> <[email protected]> > > >> wrote: > > >> > > >> Hi JB, > > >>> Thanks for the welcome! I come from the Python land so I am not > quite > > >>> familiar with Maven. What do you mean by a Maven module? You mean an > > >>> artifact so you can install things? In Python, people are used to > > >>> packages > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for > > >>> Python). Whatever is the standard way of doing things in Apache we'll > > do > > >>> it. Just asking for clarifications. > > >>> > > >>> By the way this discussion is very useful since we will have to iron > > out > > >>> several details like this. > > >>> Thanks, > > >>> Silviu > > >>> > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré < > [email protected]> > > >>> wrote: > > >>> > > >>> Hi Silviu, > > >>>> > > >>>> thanks for detailed update and great work ! > > >>>> > > >>>> I would advice to create a: > > >>>> > > >>>> sdks/python > > >>>> > > >>>> Maven module to store the Python SDK. > > >>>> > > >>>> WDYT ? > > >>>> > > >>>> By the way, welcome aboard and great to have you all guys in the > team > > ! > > >>>> > > >>>> Regards > > >>>> JB > > >>>> > > >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote: > > >>>> > > >>>> Hi all, > > >>>>> > > >>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow > > team > > >>>>> working on the Python SDK. As the original Beam proposal ( > > >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have > > >>>>> been > > >>>>> planning to merge the Python SDK into Beam. The Python SDK is in an > > >>>>> > > >>>> early > > >>> > > >>>> stage of development (alpha milestone) and so this is a good time to > > >>>>> > > >>>> move > > >>> > > >>>> the code without causing too much disruption to our customers. > > >>>>> Additionally, this enables the Beam community to contribute as soon > > as > > >>>>> possible. > > >>>>> > > >>>>> The current state of the SDK is as follows: > > >>>>> > > >>>>> - > > >>>>> > > >>>>> Open-sourced at > > >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/ > > >>>>> > > >>>>> > > >>>>> - > > >>>>> > > >>>>> Model: All main concepts are present. > > >>>>> - > > >>>>> > > >>>>> I/O: SDK supports text (Google Cloud Storage) and BigQuery > > >>>>> > > >>>> connectors > > >>> > > >>>> and has a framework for adding additional sources and sinks. > > >>>>> - > > >>>>> > > >>>>> Runners: SDK has two pipeline runners: direct runner (in > > process, > > >>>>> local > > >>>>> execution) and Cloud Dataflow runner for batch pipelines > (submit > > >>>>> job > > >>>>> to > > >>>>> Google Dataflow service). The current direct runner is bounded > > >>>>> only > > >>>>> (batch > > >>>>> execution) but there is work in progress to support unbounded > > (as > > >>>>> in > > >>>>> Java). > > >>>>> - > > >>>>> > > >>>>> Testing: The code base has unit test coverage for all the > > modules > > >>>>> > > >>>> and > > >>> > > >>>> several integration and end to end tests (similar in coverage > to > > >>>>> the > > >>>>> Java > > >>>>> SDK). Streaming is not well tested end to end yet since Cloud > > >>>>> > > >>>> Dataflow > > >>> > > >>>> focused first on batch. > > >>>>> - > > >>>>> > > >>>>> Docs: We have matching Python documentation for the features > > >>>>> > > >>>> currently > > >>> > > >>>> supported by Cloud Dataflow. The docs are on cloud.google.com > > >>>>> > > >>>> (access > > >>> > > >>>> only by whitelist due to the alpha stage of the project). Devin > > is > > >>>>> working > > >>>>> on the transition of all docs to Apache. > > >>>>> > > >>>>> > > >>>>> In the next days/weeks we would like to prepare and start migrating > > the > > >>>>> code and you should start seeing some pull requests. We also hope > > that > > >>>>> > > >>>> the > > >>> > > >>>> Beam community will shape the SDK going forward. In particular, all > > the > > >>>>> model improvements implemented for Java (Runner API, etc.) will > have > > >>>>> equivalents in Python once they stabilize. If you have any advice > > >>>>> before > > >>>>> we > > >>>>> start the journey please let us know. > > >>>>> > > >>>>> The team that will join the Beam effort consists of me (Silviu > > >>>>> > > >>>> Calinoiu), > > >>> > > >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not > least > > >>>>> Robert Bradshaw (who is already an Apache Beam committer). > > >>>>> > > >>>>> So let us know what you think! > > >>>>> > > >>>>> Best regards, > > >>>>> > > >>>>> Silviu > > >>>>> > > >>>>> > > >>>>> -- > > >>>> Jean-Baptiste Onofré > > >>>> [email protected] > > >>>> http://blog.nanthrax.net > > >>>> Talend - http://www.talend.com > > >>>> > > >>>> > > >>> > > >> > > > -- > > > Jean-Baptiste Onofré > > > [email protected] > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > > >
