Hello Robert,

I think a Kanban board on Jira as proposed by Ahmet can be helpful for
this. I'll look into setting one up tomorrow.

In the meantime, you can find the first pull request with the updated
coders package here:
https://github.com/apache/beam/pull/4990

Kind regards,
Robbe

On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <rober...@google.com> wrote:

> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
> wrote:
>
>> Thanks Ahmet and Robert,
>>
>> I think we can work on different subpackages in parallel, but it's
>> important to apply the same strategy everywhere. I'm currently working on
>> applying step 1 (was mostly done already) and 2 of the proposal to the
>> coders subpackage to create a first pull request. We can then discuss the
>> applied strategy in detail before merging and applying it to the other
>> subpackages.
>>
>
> Sounds good. Again, could you document (in a more permanent/easy to look
> up state than email) when packages are started/done?
>
>
>> This strategy also includes the choice of automated tools. I'm focusing
>> on writing python 3 code with python 2 compatibility, which means depending
>> on the future package instead of the six package (which is already used in
>> some places in the current code base). I have already noticed that this
>> indeed requires a lot of manual work after running the automated script.
>> The future package supports python 3.3+ compatibility, so I don't think
>> there is a higher cost supporting 3.4 compared to 3.5+.
>>
>
> Sure. It may incur a higher maintenance burden long-term though.
> (Basically, if we go out the door with 3.4 it's a promise to support it for
> some time to come.)
>
>
>> I have already added a tox environment to run pylint2 with the --py3k
>> argument per updated subpackage, which should help avoid regression between
>> step 2 and step 3 of the proposal. This update will be pushed with the
>> first pull request.
>>
>> Kind regards,
>> Robbe
>>
>>
>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <rober...@google.com> wrote:
>>
>>> Thank you, Robbie, for your offer to help with contribution here. I read
>>> over your doc and the one thing I'd like to add is that this work is very
>>> parallelizable, but if we have enough people looking at it we'll want some
>>> way to coordinate so as to not overlap work (or just waste time discovering
>>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>>> a spreadsheet with modules/packages on one axis and the various
>>> automated/manual conversions along the other would be helpful?
>>>
>>> A note on automated tools, they're sometimes overly conservative, so we
>>> should be sure to review the changes manually. (A typical example of this
>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>> Python 3.)
>>>
>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>> identify it and decide that before widespread announcement.
>>>
>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>>
>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
>>>>> wrote:
>>>>>
>>>>>> Hi Anand,
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>>
>>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>>> Are there any performance tests in place to check for performance
>>>>>> regressions?
>>>>>>
>>>>>
>>>> Yes there is a suite (
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>> would not block python 3 work on performance for now. That is the
>>>> unfortuante state of things.
>>>>
>>>> If anybody in the community is interested, this would be a great
>>>> opportunity to help with benchmarks in general.
>>>>
>>>>
>>>>>
>>>>>> Some questions were raised in the proposal document which I want to
>>>>>> add to this conversation:
>>>>>>
>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>> sources on this though).
>>>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>>>> during porting and add 3.5 and 3.6 later so we don't slow down the 
>>>>>> porting
>>>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>>>> It would be great to get some opinions on this.
>>>>>>
>>>>>
>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>> ~20% usage and python 3.4+ has 99% (
>>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>> Based on that, I think it makes sense to support it.
>>>>
>>>>
>>>>
>>>>>
>>>>>> Another comment was made on how to avoid regression during the
>>>>>> porting progress.
>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>> warnings should remain, so it would be great if we could enforce this 
>>>>>> check
>>>>>> for every pull request on an already updated subpackage.
>>>>>> After applying step 3, all tests should run on python 3, so again it
>>>>>> would be great if we can enforce these per updated subpackage.
>>>>>> Any insights on how to best accomplish this?
>>>>>>
>>>>> So you can look at some of the recent changes to tox.ini in the git
>>>>> log to see what we’ve done so far around this I suspect you can repeat 
>>>>> that
>>>>> same pattern.
>>>>>
>>>>
>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>>>> help a lot to prevent regressions.
>>>>
>>>>
>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Robbe
>>>>>>
>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> Thank you Robbe.
>>>>>>>
>>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>>> some points that were not mentioned:
>>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things 
>>>>>>> on
>>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>> validate that we are still compatible for python 2.
>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>
>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>> issues, close stale ones, or take ownership of any open issues. All 
>>>>>>> JIRAs
>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>> JIRA.)
>>>>>>>
>>>>>>> I will also call out to a few other people in addition to Holden who
>>>>>>> helped out or showed interest in helping with Python 3. @cclaus, @
>>>>>>> luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>>> include these people (and myself) for reviews and other questions that 
>>>>>>> you
>>>>>>> have.
>>>>>>>
>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ahmet
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>> robbe.sneyd...@ml6.eu> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> In the next month(s), me and my colleague Matthias will commit a
>>>>>>>> lot of time and effort to python 3 support for beam and we would like 
>>>>>>>> to
>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>
>>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>>
>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>> inactive for the past year. Other smaller issues have been opened, but 
>>>>>>>> it's
>>>>>>>> hard to track the general progress. It would be great if anyone could 
>>>>>>>> offer
>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>
>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>>> add python 3 support, so it would be great to get your insights and 
>>>>>>>> find a
>>>>>>>> way to merge our efforts.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Robbe
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>
>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>> --
>>>>>>>>
>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>
>>>>>>>> * Robbe Sneyders*
>>>>>>>>
>>>>>>>> ML6 Gent
>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>
>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Reply via email to