+1 merged after 0.5.

It's on a great trajectory in terms of development and community.

On Tue, Jan 17, 2017 at 5:48 PM, Kenneth Knowles <k...@google.com.invalid>
wrote:

> Seems reasonable, and the timeline Davor suggests makes a lot of sense.
>
> On Tue, Jan 17, 2017 at 3:59 PM, Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> > I'm also for merging to master.
> >
> > On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> > > It makes sense to merge after 0.5.0 release.
> > >
> > > Good point Davor: +1
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 01/17/2017 03:34 PM, Davor Bonaci wrote:
> > >
> > >> +1. I think merging to master would be an awesome next step for the
> > Python
> > >> SDK.
> > >>
> > >> And, thanks for a great summary of the current state, roadmap, and
> > impact
> > >> to the project as a whole -- awesome!
> > >>
> > >> Process-wise, I'd suggest starting a formal vote once this discussion
> > >> seems
> > >> to be trending towards a conclusion, and complete the merge as soon as
> > the
> > >> next release (0.5.0) is cut. This would enable additional time before
> > >> 0.6.0
> > >> to figure out compliance, release process impact, etc.
> > >>
> > >> Great work everyone!
> > >>
> > >> On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > >> wrote:
> > >>
> > >> Hi
> > >>>
> > >>> I didn't try the Python SDK recently but you provided a clear "state
> of
> > >>> the art". Anyway I'm in favor of merging things as quick as possible
> > >>> (assuming it's in a good shape in term of build, test, ...): it would
> > >>> potentially grow up the "external" contributions.
> > >>>
> > >>> So +1 from my side.
> > >>>
> > >>> Regards
> > >>> JB⁣​
> > >>>
> > >>> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay
> <al...@google.com.INVALID
> > >
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> tl;dr: I would like to start a discussion about merging python-sdk
> > >>>> branch
> > >>>> to master branch. Python SDK is mature enough and merging it to
> master
> > >>>> will
> > >>>> accelerate its development and adoption.
> > >>>>
> > >>>> With a great effort from a lot of contributors(*), Python SDK [1] is
> > >>>> now a
> > >>>> mostly complete, tested, performant Python implementation of the
> Beam
> > >>>> model. Since June, when we first started with Python SDK in Apache
> > Beam
> > >>>> we
> > >>>> have been continuously improving it.
> > >>>>
> > >>>> ** Python SDK currently supports:
> > >>>>
> > >>>> * Model: All main concepts are present (ParDo, GroupByKey, Windowing
> > >>>> etc.).
> > >>>> * IO: There are extensible APIs for writing new bounded sources and
> > >>>> sinks.
> > >>>> Implementations are provided for Text, Avro, BigQuery, and
> Datastore.
> > >>>> * Runners: Python SDK has an extensible base runner module that
> allows
> > >>>> building specific runners on top of it. The SDK comes with two
> > pipeline
> > >>>> runners: DirectRunner and DataflowRunner; and it is possible to add
> > >>>> more.
> > >>>> The existing runners are currently limited to bounded execution and
> > >>>> otherwise equivalent to their Java SDK counterparts in
> functionality.
> > >>>> * Testing: Python SDK implements ValidatesRunner test framework for
> > >>>> implementing integration test for current and future runners. There
> is
> > >>>> unit
> > >>>> test coverage for all modules, and a number of integrations test for
> > >>>> validating existing runners.
> > >>>> * Documentation and examples: Documentation work has started on
> Python
> > >>>> SDK.
> > >>>> Beam Programming Guide page has been updated to include Python [2].
> > The
> > >>>> code comes with many ready to use examples and we are in a good
> place
> > >>>> to
> > >>>> start documenting those on the website.
> > >>>>
> > >>>> ** We are not done yet, next on the roadmap we have:
> > >>>>
> > >>>> * Streaming: Both of the existing runners lack support for streaming
> > >>>> execution, and currently there is work going on for adding streaming
> > >>>> support to DirectRunner [3].
> > >>>> * Documentation: Filling the rest of the Beam documentations with
> > >>>> Python
> > >>>> SDK specific information and examples.
> > >>>> * SDK consistency: Making Python SDK consistent with the Java SDK.
> We
> > >>>> have
> > >>>> come a long way on this and have only a few items left [4].
> > >>>> * Beamifying: We have been working on removing Dataflow-specific
> > >>>> references
> > >>>> both from the documentation and from the code. There is some work
> > left,
> > >>>> and
> > >>>> we are currently working on those as well [5].
> > >>>>
> > >>>> ** Steps and implications of merging to master:
> > >>>>
> > >>>> * Master branch is merged to python-sdk branch at regular intervals
> > and
> > >>>> the
> > >>>> last merge was on 12/22. All the past merges were uneventful because
> > >>>> there
> > >>>> is a minimal overlap in modified files between branches. Integrating
> > >>>> python-sdk to master will similarly touch a small number of existing
> > >>>> files.
> > >>>>
> > >>>> * Python SDK is using the same tools for building and testing. It is
> > >>>> already integrated with Maven, Jenkins and Travis. Specifically the
> > >>>> impact
> > >>>> to the testing infrastructure would be:
> > >>>> - There will be two additional test configurations in Travis. Since
> > >>>> Travis
> > >>>> runs all configurations in parallel there should not be a noticeable
> > >>>> change
> > >>>> in the Travis run time.
> > >>>> - Jenkins pre-commit test will start running the Python SDK tests.
> It
> > >>>> will
> > >>>> add an additional 5 minutes to the completion time of pre-commit
> test.
> > >>>> Historically Python SDK tests were not flaky and did not cause any
> > >>>> random
> > >>>> failures.
> > >>>> - Jenkins Python post-commit test is already separated from the
> other
> > >>>> post-commit tests and will continue to exist. It would not change
> the
> > >>>> testing time for any other test.
> > >>>>
> > >>>> * The release process needs to be updated to accommodate releasing
> > >>>> Python
> > >>>> artifacts. Python SDK would fit in the existing release schedule and
> > >>>> could
> > >>>> be released along with the Java SDK. The additional steps would
> > >>>> include:
> > >>>> - Generating Python artifacts. This could be done with a single
> > command
> > >>>> using Maven today.
> > >>>> - Publishing the artifacts to a central repository such as PyPI.
> > >>>> - Updating the release guide to reflect the changes above.
> > >>>>
> > >>>> * Users: There are existing users using the Python SDK. To give a
> > rough
> > >>>> estimate, a distribution of the Beam Python SDK had a total of 23K
> > >>>> downloads in the past 6 months [6]. Some of those users are already
> > >>>> engaged
> > >>>> with the community (e.g. [7]). There might be an increased amount
> > >>>> engagement from the rest of them after the merge.
> > >>>>
> > >>>> Looking forward to hearing your thoughts and comments on
> “graduating”
> > >>>> python-sdk to the master.
> > >>>>
> > >>>> Thank you,
> > >>>> Ahmet
> > >>>>
> > >>>> (*) Python SDK branch currently has a diverse group of contributors.
> > >>>> Regular contributors include Charles Chen, Chamikara Jayalath, María
> > >>>> García
> > >>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam PMC),
> > >>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had contributions
> > >>>> from
> > >>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, and
> > >>>> Younghee Kwon.
> > >>>>
> > >>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python
> > >>>> [2] https://beam.apache.org/documentation/programming-guide/
> > >>>> [3] https://issues.apache.org/jira/browse/BEAM-1265
> > >>>> [4]
> > >>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%
> > >>>>
> > >>> 20Open%20AND%20labels%20%3D%20sdk-consistency
> > >>>
> > >>>> [5] https://issues.apache.org/jira/browse/BEAM-1218
> > >>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json
> > >>>> [7] https://issues.apache.org/jira/browse/BEAM-1251
> > >>>>
> > >>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Reply via email to