I don't use gradle commands for Python development either, because they are
slow (no incremental testing).



On Fri, Mar 29, 2019 at 9:16 AM Michael Luckey <adude3...@gmail.com> wrote:

>
>
> On Fri, Mar 29, 2019 at 2:31 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Fri, Mar 29, 2019 at 12:54 PM Michael Luckey <adude3...@gmail.com>
>> wrote:
>> >
>> > Really like the idea of improving here.
>> >
>> > Unfortunately, I haven't worked with python on that scale yet, so bear
>> with my naive understandings in this regard. If I understand correctly, the
>> suggestion will result in a couple of projects consisting only of a
>> build,gradle file to kind of workaround on gradles decision not to
>> parallelize within projects, right? In consequence, this also kind of
>> decouples projects from their content - they stuff which constitutes the
>> project - and forces the build file to 'somehow reach out to content of
>> other (only python root?) projects. E.g couples projects. This somehow
>> 'feels non natural' to me. But, of course, might be the path to go. As I
>> said before, never worked on python on that scale.
>>
>> It feels a bit odd to me as well. Is it possible to have multiple
>> projects per directory (e.g. a suite of testing ones) rather than
>> having to break things up like this, especially if the goal is
>> primarily to get parallel running of tests? Especially if we could
>> automatically create the cross-product rather than manually? There
>> also seems to be some redundancy with what tox is doing here.
>>
>
> Not sure, whether I understand correctly. But I do not think that's
> possible. If we are going to do some cross-product, we are probably better
> of doing that on tasks, e.g. by leveraging task rules or programmatically
> adding tasks (which is already done in parts). Of course, this will not
> help with parallelisation (but might enable that, see below).
>
>
>>
>> > But I believe to remember Robert talking about using in project
>> parallelisation for his development. Is this something which could also
>> work on CI? Of course, that will not help with different python versions,
>> but maybe that could be solved also by gradles variants which are
>> introduced in 5.3 - definitely need some time to investigate the
>> possibilities here. On first sight it feels like lots of duplication to
>> create 'builds' for any python version. Or wouldn't that be the case?
>> >
>> > And another naive thought on my side, isn't that non parallelizability
>> also caused by the monolithic setup of the python code base? E.g. if I
>> understand correctly, java sdk is split into core/runners/ios etc, each
>> encapsulate into full blown projects, i.e. buckets of sources, tests and
>> build file. Would it technically possible to do something similar with
>> python? I assume that being discussed before and teared apart, but couldn't
>> find on mailing list.
>>
>> Neither the culture nor the tooling of Python supports lots of
>> interdependent "sub-packages" for a single project--at least not
>> something smaller than one would want to deploy to Pypi. So while one
>> could do this, it'd be going against the grain. There are also much
>> lower-hanging opportunities for parallelization (e.g. running the test
>> suites for separate python versions in parallel).
>>
>> It's not very natural (as I understand it) with Go either. If we're
>> talking directory re-organization, I think it would make sense to
>> consider having top-level java, python, go, ... next to model,
>> website, etc.
>>
>
> Yes. We shouldn't work against common culture/practices, but try to
> embrace native tooling and add support where required.
>
> To reiterate on parallelisation, there are (at least) three opportunities:
>
> 1. Parallelise on test level. For python, this is detox?
>

This is actually 2 levels. :)
1a. Parallelise at the nosetest level - run unit tests in parallel in a
single tox environment. (I have a PR in progress to migrate to pytest, and
we should be able to do file-level parallelism provided we solve pickling
issues.)
1b. Parallelise at the tox environment level, e..g, somehow running the
multiple tox environments (py27,py27-cython,py35,...) in parallel.


> 2. Parallelise on Gradle project level (--parallel option)
> 3. Parallelise on CI level
>
> So what I ve done before, if 1. does not help, nor 2. cause project is
> 'just to big' was 3, i.e. splitting on CI level. So the simplest thing I
> could imagine right now would be - as suggested above - to split
> pythonPreCommit into something like pythonX_YPrecommit, which then runs
> those different python versions in parallel. Of course, that could be done
> ad infinitum by splitting further into runners, IOs whatever.
>
> OTOH, we do already have tons of jobs running as we seem to map Gradle
> tasks to Jenkins jobs. So it might be more appropriate to leverage CI
> parallelization on jenkins pipeline level. Something like creating
> pythonPrecommit as a pipeline, which itself runs several steps in parallel.
> Did not contemplate on the impacts, though, on PRs and phrase triggering.
> Of course, we might need to improve on our 'verbosity' as currently it
> seems not always clear what command line actually triggered etc. With this
> approach, it seems possible to run tasks within same Gradle project in
> parallel with whatever granularity we might want to choose. (Of course we
> do add overhead here, but that's probably always the case by doing things
> in parallel)
>
> Of course 3. would not help on parallelisation on developers machine, but
> I never had a problem with that.
>
>
>>
>> > And as a last thought, will usage of pygradle help with better
>> python/gradle integration? Currently, we mainly use gradle to call into
>> shell scripts, which doesn't help gradle nor probably pythons tooling to do
>> the job very well? But deeper integration might cause problems on python
>> dev side, dunno :(
>>
>> Possibly.
>>
>> Are there any Python developers that primarily use the gradle
>> commands? Personally, I only use them if I'm using Java (or sometimes
>> work that is a mix of Java and Python, e.g. the Python-on-Flink
>> tests). Otherwise I use tox, or "python setup.py test [-s ...]"
>> directly. Gradle primarily has value as a top-level orchestration (in
>> particular for CI) and easy way for those who only touch Python
>> occasionally to run all the tests. If that's the case, optimizing our
>> gradle scripts for CI seems best.
>>
>
> My impression till today is, that nobody is actually using Gradle on daily
> python development. So I assume it is currently used on CI only. But I
> believe gradle could add value for devs also. We just need to improve our
> current integration. And to be very clear, gradle should not replace python
> tooling, but support usage in a way which looks familiar to any python dev.
> And probably render some shell scripting obsolete.
>
>
>>
>> > On Thu, Mar 28, 2019 at 6:37 PM Mark Liu <mark...@apache.org> wrote:
>> >>
>> >> Thank you Ahmet. Answer your questions below:
>> >>
>> >>>
>> >>> - Could you comment on what kind of parallelization we will gain by
>> this? In terms of real numbers, how would this affect build and test times?
>> >>
>> >>
>> >> The proposal is based on Gradle parallel execution: "you can force
>> Gradle to execute tasks in parallel as long as those tasks are in different
>> projects". In Beam, project is declared per build.gradle file and
>> registered in settings.gradle. Tasks that are included in single Gradle
>> execution will run in parallel only if they are declared in separate
>> build.gradle files.
>> >>
>> >> An example of applying parallel is beam_PreCommit_Python job which
>> runs :pythonPreCommit task that contains tasks distributed in 4
>> build.gradle. The execution graph looks like
>> https://scans.gradle.com/s/4frpmto6o7hto/timeline:
>> >>
>> >> Without this proposal, all tasks will run in sequential which can be
>> ~2x longer. If more py36 and py37 tests added in the future, things will be
>> even worse.
>> >>
>> >>> - I am guessing this will reduce complexity. Is it possible to
>> quantify the improvement related to this?
>> >>
>> >>
>> >> The general code complexity of function/method/property may not change
>> here since we basically group tasks in a different way without changing
>> inside logic. I don't know if there is any tool to measure Gradle build
>> complexity. Would love to try if there is.
>> >>
>> >>>
>> >>> - Beyond the proposal, I am assuming you are willing to work on. Just
>> want to clarify this. In either case, would you need help?
>> >>
>> >>
>> >> Yes, I'd love to take on major refactor works. At the same time, I'll
>> create jira for each kind of tests (like flink/protable/hdfs tests) in
>> sdks/python/build.gradle to move into test-suites. Test owners or anyone
>> interested to this work are welcome to contribute!
>> >>
>> >> Mark
>> >>
>> >> On Wed, Mar 27, 2019 at 3:53 PM Ahmet Altay <al...@google.com> wrote:
>> >>>
>> >>> This sounds good to me. Thank you for doing this. Few questions:
>> >>> - Could you comment on what kind of parallelization we will gain by
>> this? In terms of real numbers, how would this affect build and test times?
>> >>> - I am guessing this will reduce complexity. Is it possible to
>> quantify the improvement related to this?
>> >>> - Beyond the proposal, I am assuming you are willing to work on. Just
>> want to clarify this. In either case, would you need help?
>> >>>
>> >>> Thank you,
>> >>> Ahmet
>> >>>
>> >>> On Wed, Mar 27, 2019 at 10:19 AM Mark Liu <mark...@apache.org> wrote:
>> >>>>
>> >>>> Hi Python SDK Developers,
>> >>>>
>> >>>> You may notice that Gradle files changed a lot recently as
>> parallelization applied to Python tests and more python versions were
>> enabled in testing. There are tricks over the build scripts and tests are
>> grown naturally and distributed under sdks/python, which caused frictions
>> (like rollback PR-8059).
>> >>>>
>> >>>> Thus, I created BEAM-6907 and would like to initiate some works to
>> cleanup and standardize Gradle structure in Python SDK. In general, I think
>> we want to:
>> >>>>
>> >>>> - Apply parallel execution
>> >>>> - Share common tasks
>> >>>> - Centralize test related tasks
>> >>>> - Have a clear Gradle structure for projects/tasks
>> >>>>
>> >>>> This is Gradle directory structure I proposed:
>> >>>>
>> >>>> sdks/python/
>> >>>>
>> >>>> build.gradle    --> hold builds, snapshot, analytic tasks
>> >>>> test-suites/    --> all pre/post/VR test suites under here
>> >>>>
>> >>>> README.md
>> >>>>
>> >>>> dataflow/    --> grouped by runner or unit test (tox)
>> >>>>
>> >>>> py27/    --> grouped by py version
>> >>>>
>> >>>> build.gradle
>> >>>>
>> >>>> py35/
>> >>>>
>> >>>> ...
>> >>>>
>> >>>> direct/
>> >>>>
>> >>>> py27/
>> >>>>
>> >>>> ...
>> >>>>
>> >>>> flink/
>> >>>>
>> >>>> tox/
>> >>>> ...
>> >>>>
>> >>>>
>> >>>> The ideas are:
>> >>>> - Only keep builds, snapshot and analytic jobs in
>> sdks/python/build.gradle
>> >>>> - Move all test related tasks to sdks/python/test-suites/
>> >>>> - In sdks/python/test-suites, we first group by runners, unit test
>> or other testing that can't fit to them, and then group by py versions if
>> needed.
>> >>>> - An example of ../test-suites/../py35/build.gradle is this.
>> >>>>
>> >>>> Please feel free to explore existing Gradle scripts in Python SDK
>> and bring any thoughts on this proposal if you have.
>> >>>>
>> >>>> Thanks!
>> >>>> Mark
>>
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to