On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <robbe.sneyd...@ml6.eu> wrote:
> Thanks Ahmet and Robert, > > I think we can work on different subpackages in parallel, but it's > important to apply the same strategy everywhere. I'm currently working on > applying step 1 (was mostly done already) and 2 of the proposal to the > coders subpackage to create a first pull request. We can then discuss the > applied strategy in detail before merging and applying it to the other > subpackages. > Sounds good. Again, could you document (in a more permanent/easy to look up state than email) when packages are started/done? > This strategy also includes the choice of automated tools. I'm focusing on > writing python 3 code with python 2 compatibility, which means depending on > the future package instead of the six package (which is already used in > some places in the current code base). I have already noticed that this > indeed requires a lot of manual work after running the automated script. > The future package supports python 3.3+ compatibility, so I don't think > there is a higher cost supporting 3.4 compared to 3.5+. > Sure. It may incur a higher maintenance burden long-term though. (Basically, if we go out the door with 3.4 it's a promise to support it for some time to come.) > I have already added a tox environment to run pylint2 with the --py3k > argument per updated subpackage, which should help avoid regression between > step 2 and step 3 of the proposal. This update will be pushed with the > first pull request. > > Kind regards, > Robbe > > > On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com> wrote: > >> Thank you, Robbie, for your offer to help with contribution here. I read >> over your doc and the one thing I'd like to add is that this work is very >> parallelizable, but if we have enough people looking at it we'll want some >> way to coordinate so as to not overlap work (or just waste time discovering >> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps >> a spreadsheet with modules/packages on one axis and the various >> automated/manual conversions along the other would be helpful? >> >> A note on automated tools, they're sometimes overly conservative, so we >> should be sure to review the changes manually. (A typical example of this >> is unnecessarily importing six.moves.xrange when there was no big reason to >> use xrange over range in Python 2, or conversely using list(range(...) in >> Python 3.) >> >> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If >> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should >> identify it and decide that before widespread announcement. >> >> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote: >> >>> >>> >>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>>> >>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu> >>>> wrote: >>>> >>>>> Hi Anand, >>>>> >>>>> Thanks for the feedback. >>>>> >>>>> It should be no problem to run everything on DataflowRunner as well. >>>>> Are there any performance tests in place to check for performance >>>>> regressions? >>>>> >>>> >>> Yes there is a suite ( >>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy). >>> It may not be very comprehensive and seems to be failing for a while. I >>> would not block python 3 work on performance for now. That is the >>> unfortuante state of things. >>> >>> If anybody in the community is interested, this would be a great >>> opportunity to help with benchmarks in general. >>> >>> >>>> >>>>> Some questions were raised in the proposal document which I want to >>>>> add to this conversation: >>>>> >>>>> The first comment was about the targeted python 3 versions. We >>>>> proposed to target 3.6 since it is the latest version available and added >>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant >>>>> sources on this though). >>>>> If the beam community prefers 3.4, I would propose to target 3.4 only >>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting >>>>> progress. 3.4 has the advantage of already being installed on the workers >>>>> and allows pySpark pipelines to be moved over to beam more easily. >>>>> It would be great to get some opinions on this. >>>>> >>>> >>> My preference is to support 3.4+. I searched a bit on the web to >>> understand the usage statistics for python 3, it seems like python 3.4 has >>> ~20% usage and python 3.4+ has 99% ( >>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html). >>> Based on that, I think it makes sense to support it. >>> >>> >>> >>>> >>>>> Another comment was made on how to avoid regression during the porting >>>>> progress. >>>>> After applying step 1 and step 2, no python 3 compatibility lint >>>>> warnings should remain, so it would be great if we could enforce this >>>>> check >>>>> for every pull request on an already updated subpackage. >>>>> After applying step 3, all tests should run on python 3, so again it >>>>> would be great if we can enforce these per updated subpackage. >>>>> Any insights on how to best accomplish this? >>>>> >>>> So you can look at some of the recent changes to tox.ini in the git log >>>> to see what we’ve done so far around this I suspect you can repeat that >>>> same pattern. >>>> >>> >>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would >>> help a lot to prevent regressions. >>> >>> >>> >>>> >>>>> Thanks, >>>>> Robbe >>>>> >>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote: >>>>> >>>>>> Thank you Robbe. >>>>>> >>>>>> I reviewed the document it looks reasonable to me. I will touch on >>>>>> some points that were not mentioned: >>>>>> - Runner exercise different code paths. Doing auto conversions and >>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on >>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will >>>>>> validate that we are still compatible for python 2. >>>>>> - Similar to above but with an eye on perf regressions. >>>>>> >>>>>> For project tracking on JIRA, please feel free to create any new >>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs >>>>>> should be assigned to the people actively working on them. If you wan to >>>>>> track it in a separate way, you can also propose that. (For example a >>>>>> kanban board is used for portability effort which is fully supported in >>>>>> JIRA.) >>>>>> >>>>>> I will also call out to a few other people in addition to Holden who >>>>>> helped out or showed interest in helping with Python 3. @cclaus, @ >>>>>> luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can >>>>>> include these people (and myself) for reviews and other questions that >>>>>> you >>>>>> have. >>>>>> >>>>>> Welcome again, and looking forward to your contributions. >>>>>> >>>>>> Thank you, >>>>>> Ahmet >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders < >>>>>> robbe.sneyd...@ml6.eu> wrote: >>>>>> >>>>>>> Hello everyone, >>>>>>> >>>>>>> In the next month(s), me and my colleague Matthias will commit a lot >>>>>>> of time and effort to python 3 support for beam and we would like to >>>>>>> discuss the best way to go forward with this. >>>>>>> >>>>>>> We have drawn up a document [1] with a high level outline of the >>>>>>> proposed approach and would like to get your feedback on this. >>>>>>> >>>>>>> The main Jira issue [2] for python 3 support has been mostly >>>>>>> inactive for the past year. Other smaller issues have been opened, but >>>>>>> it's >>>>>>> hard to track the general progress. It would be great if anyone could >>>>>>> offer >>>>>>> some insights on how to best handle this project on Jira. >>>>>>> >>>>>>> @Holden Karau, you seem to have already put in a lot of effort to >>>>>>> add python 3 support, so it would be great to get your insights and >>>>>>> find a >>>>>>> way to merge our efforts. >>>>>>> >>>>>>> Kind regards, >>>>>>> Robbe >>>>>>> >>>>>>> [1] >>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing >>>>>>> >>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251 >>>>>>> -- >>>>>>> >>>>>>> [image: https://ml6.eu] <https://ml6.eu/> >>>>>>> >>>>>>> * Robbe Sneyders* >>>>>>> >>>>>>> ML6 Gent >>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> >>>>>>> >>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008> >>>>>>> >>>>>> >>>>>> -- >>>>> >>>>> [image: https://ml6.eu] <https://ml6.eu/> >>>>> >>>>> * Robbe Sneyders* >>>>> >>>>> ML6 Gent >>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> >>>>> >>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008> >>>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> >>> >>> -- > > [image: https://ml6.eu] <https://ml6.eu/> > > * Robbe Sneyders* > > ML6 Gent > <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl> > > M: +32 474 71 31 08 >