On Fri, Mar 29, 2019 at 12:54 PM Michael Luckey <adude3...@gmail.com> wrote:
>
> Really like the idea of improving here.
>
> Unfortunately, I haven't worked with python on that scale yet, so bear with 
> my naive understandings in this regard. If I understand correctly, the 
> suggestion will result in a couple of projects consisting only of a 
> build,gradle file to kind of workaround on gradles decision not to 
> parallelize within projects, right? In consequence, this also kind of 
> decouples projects from their content - they stuff which constitutes the 
> project - and forces the build file to 'somehow reach out to content of other 
> (only python root?) projects. E.g couples projects. This somehow 'feels non 
> natural' to me. But, of course, might be the path to go. As I said before, 
> never worked on python on that scale.

It feels a bit odd to me as well. Is it possible to have multiple
projects per directory (e.g. a suite of testing ones) rather than
having to break things up like this, especially if the goal is
primarily to get parallel running of tests? Especially if we could
automatically create the cross-product rather than manually? There
also seems to be some redundancy with what tox is doing here.

> But I believe to remember Robert talking about using in project 
> parallelisation for his development. Is this something which could also work 
> on CI? Of course, that will not help with different python versions, but 
> maybe that could be solved also by gradles variants which are introduced in 
> 5.3 - definitely need some time to investigate the possibilities here. On 
> first sight it feels like lots of duplication to create 'builds' for any 
> python version. Or wouldn't that be the case?
>
> And another naive thought on my side, isn't that non parallelizability also 
> caused by the monolithic setup of the python code base? E.g. if I understand 
> correctly, java sdk is split into core/runners/ios etc, each encapsulate into 
> full blown projects, i.e. buckets of sources, tests and build file. Would it 
> technically possible to do something similar with python? I assume that being 
> discussed before and teared apart, but couldn't find on mailing list.

Neither the culture nor the tooling of Python supports lots of
interdependent "sub-packages" for a single project--at least not
something smaller than one would want to deploy to Pypi. So while one
could do this, it'd be going against the grain. There are also much
lower-hanging opportunities for parallelization (e.g. running the test
suites for separate python versions in parallel).

It's not very natural (as I understand it) with Go either. If we're
talking directory re-organization, I think it would make sense to
consider having top-level java, python, go, ... next to model,
website, etc.

> And as a last thought, will usage of pygradle help with better python/gradle 
> integration? Currently, we mainly use gradle to call into shell scripts, 
> which doesn't help gradle nor probably pythons tooling to do the job very 
> well? But deeper integration might cause problems on python dev side, dunno :(

Possibly.

Are there any Python developers that primarily use the gradle
commands? Personally, I only use them if I'm using Java (or sometimes
work that is a mix of Java and Python, e.g. the Python-on-Flink
tests). Otherwise I use tox, or "python setup.py test [-s ...]"
directly. Gradle primarily has value as a top-level orchestration (in
particular for CI) and easy way for those who only touch Python
occasionally to run all the tests. If that's the case, optimizing our
gradle scripts for CI seems best.

> On Thu, Mar 28, 2019 at 6:37 PM Mark Liu <mark...@apache.org> wrote:
>>
>> Thank you Ahmet. Answer your questions below:
>>
>>>
>>> - Could you comment on what kind of parallelization we will gain by this? 
>>> In terms of real numbers, how would this affect build and test times?
>>
>>
>> The proposal is based on Gradle parallel execution: "you can force Gradle to 
>> execute tasks in parallel as long as those tasks are in different projects". 
>> In Beam, project is declared per build.gradle file and registered in 
>> settings.gradle. Tasks that are included in single Gradle execution will run 
>> in parallel only if they are declared in separate build.gradle files.
>>
>> An example of applying parallel is beam_PreCommit_Python job which runs 
>> :pythonPreCommit task that contains tasks distributed in 4 build.gradle. The 
>> execution graph looks like https://scans.gradle.com/s/4frpmto6o7hto/timeline:
>>
>> Without this proposal, all tasks will run in sequential which can be ~2x 
>> longer. If more py36 and py37 tests added in the future, things will be even 
>> worse.
>>
>>> - I am guessing this will reduce complexity. Is it possible to quantify the 
>>> improvement related to this?
>>
>>
>> The general code complexity of function/method/property may not change here 
>> since we basically group tasks in a different way without changing inside 
>> logic. I don't know if there is any tool to measure Gradle build complexity. 
>> Would love to try if there is.
>>
>>>
>>> - Beyond the proposal, I am assuming you are willing to work on. Just want 
>>> to clarify this. In either case, would you need help?
>>
>>
>> Yes, I'd love to take on major refactor works. At the same time, I'll create 
>> jira for each kind of tests (like flink/protable/hdfs tests) in 
>> sdks/python/build.gradle to move into test-suites. Test owners or anyone 
>> interested to this work are welcome to contribute!
>>
>> Mark
>>
>> On Wed, Mar 27, 2019 at 3:53 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>> This sounds good to me. Thank you for doing this. Few questions:
>>> - Could you comment on what kind of parallelization we will gain by this? 
>>> In terms of real numbers, how would this affect build and test times?
>>> - I am guessing this will reduce complexity. Is it possible to quantify the 
>>> improvement related to this?
>>> - Beyond the proposal, I am assuming you are willing to work on. Just want 
>>> to clarify this. In either case, would you need help?
>>>
>>> Thank you,
>>> Ahmet
>>>
>>> On Wed, Mar 27, 2019 at 10:19 AM Mark Liu <mark...@apache.org> wrote:
>>>>
>>>> Hi Python SDK Developers,
>>>>
>>>> You may notice that Gradle files changed a lot recently as parallelization 
>>>> applied to Python tests and more python versions were enabled in testing. 
>>>> There are tricks over the build scripts and tests are grown naturally and 
>>>> distributed under sdks/python, which caused frictions (like rollback 
>>>> PR-8059).
>>>>
>>>> Thus, I created BEAM-6907 and would like to initiate some works to cleanup 
>>>> and standardize Gradle structure in Python SDK. In general, I think we 
>>>> want to:
>>>>
>>>> - Apply parallel execution
>>>> - Share common tasks
>>>> - Centralize test related tasks
>>>> - Have a clear Gradle structure for projects/tasks
>>>>
>>>> This is Gradle directory structure I proposed:
>>>>
>>>> sdks/python/
>>>>
>>>> build.gradle    --> hold builds, snapshot, analytic tasks
>>>> test-suites/    --> all pre/post/VR test suites under here
>>>>
>>>> README.md
>>>>
>>>> dataflow/    --> grouped by runner or unit test (tox)
>>>>
>>>> py27/    --> grouped by py version
>>>>
>>>> build.gradle
>>>>
>>>> py35/
>>>>
>>>> ...
>>>>
>>>> direct/
>>>>
>>>> py27/
>>>>
>>>> ...
>>>>
>>>> flink/
>>>>
>>>> tox/
>>>> ...
>>>>
>>>>
>>>> The ideas are:
>>>> - Only keep builds, snapshot and analytic jobs in sdks/python/build.gradle
>>>> - Move all test related tasks to sdks/python/test-suites/
>>>> - In sdks/python/test-suites, we first group by runners, unit test or 
>>>> other testing that can't fit to them, and then group by py versions if 
>>>> needed.
>>>> - An example of ../test-suites/../py35/build.gradle is this.
>>>>
>>>> Please feel free to explore existing Gradle scripts in Python SDK and 
>>>> bring any thoughts on this proposal if you have.
>>>>
>>>> Thanks!
>>>> Mark

Reply via email to