On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca> wrote:
> > On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu> > wrote: > >> Hi Anand, >> >> Thanks for the feedback. >> >> It should be no problem to run everything on DataflowRunner as well. >> Are there any performance tests in place to check for performance >> regressions? >> > Yes there is a suite ( https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy). It may not be very comprehensive and seems to be failing for a while. I would not block python 3 work on performance for now. That is the unfortuante state of things. If anybody in the community is interested, this would be a great opportunity to help with benchmarks in general. > >> Some questions were raised in the proposal document which I want to add >> to this conversation: >> >> The first comment was about the targeted python 3 versions. We proposed >> to target 3.6 since it is the latest version available and added 3.5 >> because 3.6 adoption seems rather low (hard to find any relevant sources on >> this though). >> If the beam community prefers 3.4, I would propose to target 3.4 only >> during porting and add 3.5 and 3.6 later so we don't slow down the porting >> progress. 3.4 has the advantage of already being installed on the workers >> and allows pySpark pipelines to be moved over to beam more easily. >> It would be great to get some opinions on this. >> > My preference is to support 3.4+. I searched a bit on the web to understand the usage statistics for python 3, it seems like python 3.4 has ~20% usage and python 3.4+ has 99% ( https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html). Based on that, I think it makes sense to support it. > >> Another comment was made on how to avoid regression during the porting >> progress. >> After applying step 1 and step 2, no python 3 compatibility lint warnings >> should remain, so it would be great if we could enforce this check for >> every pull request on an already updated subpackage. >> After applying step 3, all tests should run on python 3, so again it >> would be great if we can enforce these per updated subpackage. >> Any insights on how to best accomplish this? >> > So you can look at some of the recent changes to tox.ini in the git log to > see what we’ve done so far around this I suspect you can repeat that same > pattern. > +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would help a lot to prevent regressions. > >> Thanks, >> Robbe >> >> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote: >> >>> Thank you Robbe. >>> >>> I reviewed the document it looks reasonable to me. I will touch on some >>> points that were not mentioned: >>> - Runner exercise different code paths. Doing auto conversions and >>> focusing on DirectRunner is not enough. It is worthwhile to run things on >>> DataflowRunner as well. This can be triggered from Jenkins. It will >>> validate that we are still compatible for python 2. >>> - Similar to above but with an eye on perf regressions. >>> >>> For project tracking on JIRA, please feel free to create any new issues, >>> close stale ones, or take ownership of any open issues. All JIRAs should be >>> assigned to the people actively working on them. If you wan to track it in >>> a separate way, you can also propose that. (For example a kanban board is >>> used for portability effort which is fully supported in JIRA.) >>> >>> I will also call out to a few other people in addition to Holden who >>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu, >>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these >>> people (and myself) for reviews and other questions that you have. >>> >>> Welcome again, and looking forward to your contributions. >>> >>> Thank you, >>> Ahmet >>> >>> >>> >>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <robbe.sneyd...@ml6.eu> >>> wrote: >>> >>>> Hello everyone, >>>> >>>> In the next month(s), me and my colleague Matthias will commit a lot of >>>> time and effort to python 3 support for beam and we would like to discuss >>>> the best way to go forward with this. >>>> >>>> We have drawn up a document [1] with a high level outline of the >>>> proposed approach and would like to get your feedback on this. >>>> >>>> The main Jira issue [2] for python 3 support has been mostly inactive >>>> for the past year. Other smaller issues have been opened, but it's hard to >>>> track the general progress. It would be great if anyone could offer some >>>> insights on how to best handle this project on Jira. >>>> >>>> @Holden Karau, you seem to have already put in a lot of effort to add >>>> python 3 support, so it would be great to get your insights and find a way >>>> to merge our efforts. >>>> >>>> Kind regards, >>>> Robbe >>>> >>>> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_ >>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing >>>> [2] https://issues.apache.org/jira/browse/BEAM-1251 >>>> -- >>>> >>>> [image: https://ml6.eu] <https://ml6.eu/> >>>> >>>> * Robbe Sneyders* >>>> >>>> ML6 Gent >>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> >>>> >>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008> >>>> >>> >>> -- >> >> [image: https://ml6.eu] <https://ml6.eu/> >> >> * Robbe Sneyders* >> >> ML6 Gent >> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> >> >> M: +32 474 71 31 08 <+32%20474%2071%2031%2008> >> > -- > Twitter: https://twitter.com/holdenkarau >