Re: [PROPOSAL] Python 3 support

Robert Bradshaw Fri, 30 Mar 2018 09:01:55 -0700

On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <[email protected]>
wrote:


> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

Sounds good. Again, could you document (in a more permanent/easy to look up
state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm focusing on
> writing python 3 code with python 2 compatibility, which means depending on
> the future package instead of the six package (which is already used in
> some places in the current code base). I have already noticed that this
> indeed requires a lot of manual work after running the automated script.
> The future package supports python 3.3+ compatibility, so I don't think
> there is a higher cost supporting 3.4 compared to 3.5+.
>

Sure. It may incur a higher maintenance burden long-term though.
(Basically, if we go out the door with 3.4 it's a promise to support it for
some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <[email protected]> wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I read
>> over your doc and the one thing I'd like to add is that this work is very
>> parallelizable, but if we have enough people looking at it we'll want some
>> way to coordinate so as to not overlap work (or just waste time discovering
>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>> a spreadsheet with modules/packages on one axis and the various
>> automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so we
>> should be sure to review the changes manually. (A typical example of this
>> is unnecessarily importing six.moves.xrange when there was no big reason to
>> use xrange over range in Python 2, or conversely using list(range(...) in
>> Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <[email protected]> wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <[email protected]>
>>> wrote:
>>>
>>>>
>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Anand,
>>>>>
>>>>> Thanks for the feedback.
>>>>>
>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>> Are there any performance tests in place to check for performance
>>>>> regressions?
>>>>>
>>>>
>>> Yes there is a suite (
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>> It may not be very comprehensive and seems to be failing for a while. I
>>> would not block python 3 work on performance for now. That is the
>>> unfortuante state of things.
>>>
>>> If anybody in the community is interested, this would be a great
>>> opportunity to help with benchmarks in general.
>>>
>>>
>>>>
>>>>> Some questions were raised in the proposal document which I want to
>>>>> add to this conversation:
>>>>>
>>>>> The first comment was about the targeted python 3 versions. We
>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>> sources on this though).
>>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>>> It would be great to get some opinions on this.
>>>>>
>>>>
>>> My preference is to support 3.4+. I searched a bit on the web to
>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>> ~20% usage and python 3.4+ has 99% (
>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>> Based on that, I think it makes sense to support it.
>>>
>>>
>>>
>>>>
>>>>> Another comment was made on how to avoid regression during the porting
>>>>> progress.
>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>> warnings should remain, so it would be great if we could enforce this 
>>>>> check
>>>>> for every pull request on an already updated subpackage.
>>>>> After applying step 3, all tests should run on python 3, so again it
>>>>> would be great if we can enforce these per updated subpackage.
>>>>> Any insights on how to best accomplish this?
>>>>>
>>>> So you can look at some of the recent changes to tox.ini in the git log
>>>> to see what we’ve done so far around this I suspect you can repeat that
>>>> same pattern.
>>>>
>>>
>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>>> help a lot to prevent regressions.
>>>
>>>
>>>
>>>>
>>>>> Thanks,
>>>>> Robbe
>>>>>
>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> Thank you Robbe.
>>>>>>
>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>> some points that were not mentioned:
>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>> validate that we are still compatible for python 2.
>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>
>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>> JIRA.)
>>>>>>
>>>>>> I will also call out to a few other people in addition to Holden who
>>>>>> helped out or showed interest in helping with Python 3. @cclaus, @
>>>>>> luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>> include these people (and myself) for reviews and other questions that 
>>>>>> you
>>>>>> have.
>>>>>>
>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>
>>>>>> Thank you,
>>>>>> Ahmet
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> In the next month(s), me and my colleague Matthias will commit a lot
>>>>>>> of time and effort to python 3 support for beam and we would like to
>>>>>>> discuss the best way to go forward with this.
>>>>>>>
>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>
>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>> inactive for the past year. Other smaller issues have been opened, but 
>>>>>>> it's
>>>>>>> hard to track the general progress. It would be great if anyone could 
>>>>>>> offer
>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>
>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>> add python 3 support, so it would be great to get your insights and 
>>>>>>> find a
>>>>>>> way to merge our efforts.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Robbe
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>
>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>> --
>>>>>>>
>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>
>>>>>>> * Robbe Sneyders*
>>>>>>>
>>>>>>> ML6 Gent
>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>
>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08
>

Re: [PROPOSAL] Python 3 support

Reply via email to