Thanks Ahmet and Robert,

I think we can work on different subpackages in parallel, but it's
important to apply the same strategy everywhere. I'm currently working on
applying step 1 (was mostly done already) and 2 of the proposal to the
coders subpackage to create a first pull request. We can then discuss the
applied strategy in detail before merging and applying it to the other
subpackages.

This strategy also includes the choice of automated tools. I'm focusing on
writing python 3 code with python 2 compatibility, which means depending on
the future package instead of the six package (which is already used in
some places in the current code base). I have already noticed that this
indeed requires a lot of manual work after running the automated script.
The future package supports python 3.3+ compatibility, so I don't think
there is a higher cost supporting 3.4 compared to 3.5+.

I have already added a tox environment to run pylint2 with the --py3k
argument per updated subpackage, which should help avoid regression between
step 2 and step 3 of the proposal. This update will be pushed with the
first pull request.

Kind regards,
Robbe


On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com> wrote:

> Thank you, Robbie, for your offer to help with contribution here. I read
> over your doc and the one thing I'd like to add is that this work is very
> parallelizable, but if we have enough people looking at it we'll want some
> way to coordinate so as to not overlap work (or just waste time discovering
> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
> a spreadsheet with modules/packages on one axis and the various
> automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so we
> should be sure to review the changes manually. (A typical example of this
> is unnecessarily importing six.moves.xrange when there was no big reason to
> use xrange over range in Python 2, or conversely using list(range(...) in
> Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>> wrote:
>>>
>>>> Hi Anand,
>>>>
>>>> Thanks for the feedback.
>>>>
>>>> It should be no problem to run everything on DataflowRunner as well.
>>>> Are there any performance tests in place to check for performance
>>>> regressions?
>>>>
>>>
>> Yes there is a suite (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>> It may not be very comprehensive and seems to be failing for a while. I
>> would not block python 3 work on performance for now. That is the
>> unfortuante state of things.
>>
>> If anybody in the community is interested, this would be a great
>> opportunity to help with benchmarks in general.
>>
>>
>>>
>>>> Some questions were raised in the proposal document which I want to add
>>>> to this conversation:
>>>>
>>>> The first comment was about the targeted python 3 versions. We proposed
>>>> to target 3.6 since it is the latest version available and added 3.5
>>>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>>>> this though).
>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>> It would be great to get some opinions on this.
>>>>
>>>
>> My preference is to support 3.4+. I searched a bit on the web to
>> understand the usage statistics for python 3, it seems like python 3.4 has
>> ~20% usage and python 3.4+ has 99% (
>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>> Based on that, I think it makes sense to support it.
>>
>>
>>
>>>
>>>> Another comment was made on how to avoid regression during the porting
>>>> progress.
>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>> warnings should remain, so it would be great if we could enforce this check
>>>> for every pull request on an already updated subpackage.
>>>> After applying step 3, all tests should run on python 3, so again it
>>>> would be great if we can enforce these per updated subpackage.
>>>> Any insights on how to best accomplish this?
>>>>
>>> So you can look at some of the recent changes to tox.ini in the git log
>>> to see what we’ve done so far around this I suspect you can repeat that
>>> same pattern.
>>>
>>
>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>> help a lot to prevent regressions.
>>
>>
>>
>>>
>>>> Thanks,
>>>> Robbe
>>>>
>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Thank you Robbe.
>>>>>
>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>> some points that were not mentioned:
>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>> validate that we are still compatible for python 2.
>>>>> - Similar to above but with an eye on perf regressions.
>>>>>
>>>>> For project tracking on JIRA, please feel free to create any new
>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>> should be assigned to the people actively working on them. If you wan to
>>>>> track it in a separate way, you can also propose that. (For example a
>>>>> kanban board is used for portability effort which is fully supported in
>>>>> JIRA.)
>>>>>
>>>>> I will also call out to a few other people in addition to Holden who
>>>>> helped out or showed interest in helping with Python 3. @cclaus, @luke
>>>>> -zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can include
>>>>> these people (and myself) for reviews and other questions that you have.
>>>>>
>>>>> Welcome again, and looking forward to your contributions.
>>>>>
>>>>> Thank you,
>>>>> Ahmet
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <robbe.sneyd...@ml6.eu
>>>>> > wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> In the next month(s), me and my colleague Matthias will commit a lot
>>>>>> of time and effort to python 3 support for beam and we would like to
>>>>>> discuss the best way to go forward with this.
>>>>>>
>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>
>>>>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>>>>> for the past year. Other smaller issues have been opened, but it's hard 
>>>>>> to
>>>>>> track the general progress. It would be great if anyone could offer some
>>>>>> insights on how to best handle this project on Jira.
>>>>>>
>>>>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>>>>> python 3 support, so it would be great to get your insights and find a 
>>>>>> way
>>>>>> to merge our efforts.
>>>>>>
>>>>>> Kind regards,
>>>>>> Robbe
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>
>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>>
>>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Reply via email to