+1 merged after 0.5. It's on a great trajectory in terms of development and community.
On Tue, Jan 17, 2017 at 5:48 PM, Kenneth Knowles <k...@google.com.invalid> wrote: > Seems reasonable, and the timeline Davor suggests makes a lot of sense. > > On Tue, Jan 17, 2017 at 3:59 PM, Lukasz Cwik <lc...@google.com.invalid> > wrote: > > > I'm also for merging to master. > > > > On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > > > It makes sense to merge after 0.5.0 release. > > > > > > Good point Davor: +1 > > > > > > Regards > > > JB > > > > > > > > > On 01/17/2017 03:34 PM, Davor Bonaci wrote: > > > > > >> +1. I think merging to master would be an awesome next step for the > > Python > > >> SDK. > > >> > > >> And, thanks for a great summary of the current state, roadmap, and > > impact > > >> to the project as a whole -- awesome! > > >> > > >> Process-wise, I'd suggest starting a formal vote once this discussion > > >> seems > > >> to be trending towards a conclusion, and complete the merge as soon as > > the > > >> next release (0.5.0) is cut. This would enable additional time before > > >> 0.6.0 > > >> to figure out compliance, release process impact, etc. > > >> > > >> Great work everyone! > > >> > > >> On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré < > j...@nanthrax.net> > > >> wrote: > > >> > > >> Hi > > >>> > > >>> I didn't try the Python SDK recently but you provided a clear "state > of > > >>> the art". Anyway I'm in favor of merging things as quick as possible > > >>> (assuming it's in a good shape in term of build, test, ...): it would > > >>> potentially grow up the "external" contributions. > > >>> > > >>> So +1 from my side. > > >>> > > >>> Regards > > >>> JB > > >>> > > >>> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay > <al...@google.com.INVALID > > > > > >>> wrote: > > >>> > > >>>> Hi all, > > >>>> > > >>>> tl;dr: I would like to start a discussion about merging python-sdk > > >>>> branch > > >>>> to master branch. Python SDK is mature enough and merging it to > master > > >>>> will > > >>>> accelerate its development and adoption. > > >>>> > > >>>> With a great effort from a lot of contributors(*), Python SDK [1] is > > >>>> now a > > >>>> mostly complete, tested, performant Python implementation of the > Beam > > >>>> model. Since June, when we first started with Python SDK in Apache > > Beam > > >>>> we > > >>>> have been continuously improving it. > > >>>> > > >>>> ** Python SDK currently supports: > > >>>> > > >>>> * Model: All main concepts are present (ParDo, GroupByKey, Windowing > > >>>> etc.). > > >>>> * IO: There are extensible APIs for writing new bounded sources and > > >>>> sinks. > > >>>> Implementations are provided for Text, Avro, BigQuery, and > Datastore. > > >>>> * Runners: Python SDK has an extensible base runner module that > allows > > >>>> building specific runners on top of it. The SDK comes with two > > pipeline > > >>>> runners: DirectRunner and DataflowRunner; and it is possible to add > > >>>> more. > > >>>> The existing runners are currently limited to bounded execution and > > >>>> otherwise equivalent to their Java SDK counterparts in > functionality. > > >>>> * Testing: Python SDK implements ValidatesRunner test framework for > > >>>> implementing integration test for current and future runners. There > is > > >>>> unit > > >>>> test coverage for all modules, and a number of integrations test for > > >>>> validating existing runners. > > >>>> * Documentation and examples: Documentation work has started on > Python > > >>>> SDK. > > >>>> Beam Programming Guide page has been updated to include Python [2]. > > The > > >>>> code comes with many ready to use examples and we are in a good > place > > >>>> to > > >>>> start documenting those on the website. > > >>>> > > >>>> ** We are not done yet, next on the roadmap we have: > > >>>> > > >>>> * Streaming: Both of the existing runners lack support for streaming > > >>>> execution, and currently there is work going on for adding streaming > > >>>> support to DirectRunner [3]. > > >>>> * Documentation: Filling the rest of the Beam documentations with > > >>>> Python > > >>>> SDK specific information and examples. > > >>>> * SDK consistency: Making Python SDK consistent with the Java SDK. > We > > >>>> have > > >>>> come a long way on this and have only a few items left [4]. > > >>>> * Beamifying: We have been working on removing Dataflow-specific > > >>>> references > > >>>> both from the documentation and from the code. There is some work > > left, > > >>>> and > > >>>> we are currently working on those as well [5]. > > >>>> > > >>>> ** Steps and implications of merging to master: > > >>>> > > >>>> * Master branch is merged to python-sdk branch at regular intervals > > and > > >>>> the > > >>>> last merge was on 12/22. All the past merges were uneventful because > > >>>> there > > >>>> is a minimal overlap in modified files between branches. Integrating > > >>>> python-sdk to master will similarly touch a small number of existing > > >>>> files. > > >>>> > > >>>> * Python SDK is using the same tools for building and testing. It is > > >>>> already integrated with Maven, Jenkins and Travis. Specifically the > > >>>> impact > > >>>> to the testing infrastructure would be: > > >>>> - There will be two additional test configurations in Travis. Since > > >>>> Travis > > >>>> runs all configurations in parallel there should not be a noticeable > > >>>> change > > >>>> in the Travis run time. > > >>>> - Jenkins pre-commit test will start running the Python SDK tests. > It > > >>>> will > > >>>> add an additional 5 minutes to the completion time of pre-commit > test. > > >>>> Historically Python SDK tests were not flaky and did not cause any > > >>>> random > > >>>> failures. > > >>>> - Jenkins Python post-commit test is already separated from the > other > > >>>> post-commit tests and will continue to exist. It would not change > the > > >>>> testing time for any other test. > > >>>> > > >>>> * The release process needs to be updated to accommodate releasing > > >>>> Python > > >>>> artifacts. Python SDK would fit in the existing release schedule and > > >>>> could > > >>>> be released along with the Java SDK. The additional steps would > > >>>> include: > > >>>> - Generating Python artifacts. This could be done with a single > > command > > >>>> using Maven today. > > >>>> - Publishing the artifacts to a central repository such as PyPI. > > >>>> - Updating the release guide to reflect the changes above. > > >>>> > > >>>> * Users: There are existing users using the Python SDK. To give a > > rough > > >>>> estimate, a distribution of the Beam Python SDK had a total of 23K > > >>>> downloads in the past 6 months [6]. Some of those users are already > > >>>> engaged > > >>>> with the community (e.g. [7]). There might be an increased amount > > >>>> engagement from the rest of them after the merge. > > >>>> > > >>>> Looking forward to hearing your thoughts and comments on > “graduating” > > >>>> python-sdk to the master. > > >>>> > > >>>> Thank you, > > >>>> Ahmet > > >>>> > > >>>> (*) Python SDK branch currently has a diverse group of contributors. > > >>>> Regular contributors include Charles Chen, Chamikara Jayalath, María > > >>>> García > > >>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam PMC), > > >>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had contributions > > >>>> from > > >>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, and > > >>>> Younghee Kwon. > > >>>> > > >>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python > > >>>> [2] https://beam.apache.org/documentation/programming-guide/ > > >>>> [3] https://issues.apache.org/jira/browse/BEAM-1265 > > >>>> [4] > > >>>> https://issues.apache.org/jira/issues/?jql=status%20%3D% > > >>>> > > >>> 20Open%20AND%20labels%20%3D%20sdk-consistency > > >>> > > >>>> [5] https://issues.apache.org/jira/browse/BEAM-1218 > > >>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json > > >>>> [7] https://issues.apache.org/jira/browse/BEAM-1251 > > >>>> > > >>> > > >>> > > >> > > > -- > > > Jean-Baptiste Onofré > > > jbono...@apache.org > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > > >