I'm also for merging to master. On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré <[email protected]> wrote:
> It makes sense to merge after 0.5.0 release. > > Good point Davor: +1 > > Regards > JB > > > On 01/17/2017 03:34 PM, Davor Bonaci wrote: > >> +1. I think merging to master would be an awesome next step for the Python >> SDK. >> >> And, thanks for a great summary of the current state, roadmap, and impact >> to the project as a whole -- awesome! >> >> Process-wise, I'd suggest starting a formal vote once this discussion >> seems >> to be trending towards a conclusion, and complete the merge as soon as the >> next release (0.5.0) is cut. This would enable additional time before >> 0.6.0 >> to figure out compliance, release process impact, etc. >> >> Great work everyone! >> >> On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >> Hi >>> >>> I didn't try the Python SDK recently but you provided a clear "state of >>> the art". Anyway I'm in favor of merging things as quick as possible >>> (assuming it's in a good shape in term of build, test, ...): it would >>> potentially grow up the "external" contributions. >>> >>> So +1 from my side. >>> >>> Regards >>> JB >>> >>> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> tl;dr: I would like to start a discussion about merging python-sdk >>>> branch >>>> to master branch. Python SDK is mature enough and merging it to master >>>> will >>>> accelerate its development and adoption. >>>> >>>> With a great effort from a lot of contributors(*), Python SDK [1] is >>>> now a >>>> mostly complete, tested, performant Python implementation of the Beam >>>> model. Since June, when we first started with Python SDK in Apache Beam >>>> we >>>> have been continuously improving it. >>>> >>>> ** Python SDK currently supports: >>>> >>>> * Model: All main concepts are present (ParDo, GroupByKey, Windowing >>>> etc.). >>>> * IO: There are extensible APIs for writing new bounded sources and >>>> sinks. >>>> Implementations are provided for Text, Avro, BigQuery, and Datastore. >>>> * Runners: Python SDK has an extensible base runner module that allows >>>> building specific runners on top of it. The SDK comes with two pipeline >>>> runners: DirectRunner and DataflowRunner; and it is possible to add >>>> more. >>>> The existing runners are currently limited to bounded execution and >>>> otherwise equivalent to their Java SDK counterparts in functionality. >>>> * Testing: Python SDK implements ValidatesRunner test framework for >>>> implementing integration test for current and future runners. There is >>>> unit >>>> test coverage for all modules, and a number of integrations test for >>>> validating existing runners. >>>> * Documentation and examples: Documentation work has started on Python >>>> SDK. >>>> Beam Programming Guide page has been updated to include Python [2]. The >>>> code comes with many ready to use examples and we are in a good place >>>> to >>>> start documenting those on the website. >>>> >>>> ** We are not done yet, next on the roadmap we have: >>>> >>>> * Streaming: Both of the existing runners lack support for streaming >>>> execution, and currently there is work going on for adding streaming >>>> support to DirectRunner [3]. >>>> * Documentation: Filling the rest of the Beam documentations with >>>> Python >>>> SDK specific information and examples. >>>> * SDK consistency: Making Python SDK consistent with the Java SDK. We >>>> have >>>> come a long way on this and have only a few items left [4]. >>>> * Beamifying: We have been working on removing Dataflow-specific >>>> references >>>> both from the documentation and from the code. There is some work left, >>>> and >>>> we are currently working on those as well [5]. >>>> >>>> ** Steps and implications of merging to master: >>>> >>>> * Master branch is merged to python-sdk branch at regular intervals and >>>> the >>>> last merge was on 12/22. All the past merges were uneventful because >>>> there >>>> is a minimal overlap in modified files between branches. Integrating >>>> python-sdk to master will similarly touch a small number of existing >>>> files. >>>> >>>> * Python SDK is using the same tools for building and testing. It is >>>> already integrated with Maven, Jenkins and Travis. Specifically the >>>> impact >>>> to the testing infrastructure would be: >>>> - There will be two additional test configurations in Travis. Since >>>> Travis >>>> runs all configurations in parallel there should not be a noticeable >>>> change >>>> in the Travis run time. >>>> - Jenkins pre-commit test will start running the Python SDK tests. It >>>> will >>>> add an additional 5 minutes to the completion time of pre-commit test. >>>> Historically Python SDK tests were not flaky and did not cause any >>>> random >>>> failures. >>>> - Jenkins Python post-commit test is already separated from the other >>>> post-commit tests and will continue to exist. It would not change the >>>> testing time for any other test. >>>> >>>> * The release process needs to be updated to accommodate releasing >>>> Python >>>> artifacts. Python SDK would fit in the existing release schedule and >>>> could >>>> be released along with the Java SDK. The additional steps would >>>> include: >>>> - Generating Python artifacts. This could be done with a single command >>>> using Maven today. >>>> - Publishing the artifacts to a central repository such as PyPI. >>>> - Updating the release guide to reflect the changes above. >>>> >>>> * Users: There are existing users using the Python SDK. To give a rough >>>> estimate, a distribution of the Beam Python SDK had a total of 23K >>>> downloads in the past 6 months [6]. Some of those users are already >>>> engaged >>>> with the community (e.g. [7]). There might be an increased amount >>>> engagement from the rest of them after the merge. >>>> >>>> Looking forward to hearing your thoughts and comments on “graduating” >>>> python-sdk to the master. >>>> >>>> Thank you, >>>> Ahmet >>>> >>>> (*) Python SDK branch currently has a diverse group of contributors. >>>> Regular contributors include Charles Chen, Chamikara Jayalath, María >>>> García >>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam PMC), >>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had contributions >>>> from >>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, and >>>> Younghee Kwon. >>>> >>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python >>>> [2] https://beam.apache.org/documentation/programming-guide/ >>>> [3] https://issues.apache.org/jira/browse/BEAM-1265 >>>> [4] >>>> https://issues.apache.org/jira/issues/?jql=status%20%3D% >>>> >>> 20Open%20AND%20labels%20%3D%20sdk-consistency >>> >>>> [5] https://issues.apache.org/jira/browse/BEAM-1218 >>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json >>>> [7] https://issues.apache.org/jira/browse/BEAM-1251 >>>> >>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
