I don't use gradle commands for Python development either, because they are slow (no incremental testing).
On Fri, Mar 29, 2019 at 9:16 AM Michael Luckey <adude3...@gmail.com> wrote: > > > On Fri, Mar 29, 2019 at 2:31 PM Robert Bradshaw <rober...@google.com> > wrote: > >> On Fri, Mar 29, 2019 at 12:54 PM Michael Luckey <adude3...@gmail.com> >> wrote: >> > >> > Really like the idea of improving here. >> > >> > Unfortunately, I haven't worked with python on that scale yet, so bear >> with my naive understandings in this regard. If I understand correctly, the >> suggestion will result in a couple of projects consisting only of a >> build,gradle file to kind of workaround on gradles decision not to >> parallelize within projects, right? In consequence, this also kind of >> decouples projects from their content - they stuff which constitutes the >> project - and forces the build file to 'somehow reach out to content of >> other (only python root?) projects. E.g couples projects. This somehow >> 'feels non natural' to me. But, of course, might be the path to go. As I >> said before, never worked on python on that scale. >> >> It feels a bit odd to me as well. Is it possible to have multiple >> projects per directory (e.g. a suite of testing ones) rather than >> having to break things up like this, especially if the goal is >> primarily to get parallel running of tests? Especially if we could >> automatically create the cross-product rather than manually? There >> also seems to be some redundancy with what tox is doing here. >> > > Not sure, whether I understand correctly. But I do not think that's > possible. If we are going to do some cross-product, we are probably better > of doing that on tasks, e.g. by leveraging task rules or programmatically > adding tasks (which is already done in parts). Of course, this will not > help with parallelisation (but might enable that, see below). > > >> >> > But I believe to remember Robert talking about using in project >> parallelisation for his development. Is this something which could also >> work on CI? Of course, that will not help with different python versions, >> but maybe that could be solved also by gradles variants which are >> introduced in 5.3 - definitely need some time to investigate the >> possibilities here. On first sight it feels like lots of duplication to >> create 'builds' for any python version. Or wouldn't that be the case? >> > >> > And another naive thought on my side, isn't that non parallelizability >> also caused by the monolithic setup of the python code base? E.g. if I >> understand correctly, java sdk is split into core/runners/ios etc, each >> encapsulate into full blown projects, i.e. buckets of sources, tests and >> build file. Would it technically possible to do something similar with >> python? I assume that being discussed before and teared apart, but couldn't >> find on mailing list. >> >> Neither the culture nor the tooling of Python supports lots of >> interdependent "sub-packages" for a single project--at least not >> something smaller than one would want to deploy to Pypi. So while one >> could do this, it'd be going against the grain. There are also much >> lower-hanging opportunities for parallelization (e.g. running the test >> suites for separate python versions in parallel). >> >> It's not very natural (as I understand it) with Go either. If we're >> talking directory re-organization, I think it would make sense to >> consider having top-level java, python, go, ... next to model, >> website, etc. >> > > Yes. We shouldn't work against common culture/practices, but try to > embrace native tooling and add support where required. > > To reiterate on parallelisation, there are (at least) three opportunities: > > 1. Parallelise on test level. For python, this is detox? > This is actually 2 levels. :) 1a. Parallelise at the nosetest level - run unit tests in parallel in a single tox environment. (I have a PR in progress to migrate to pytest, and we should be able to do file-level parallelism provided we solve pickling issues.) 1b. Parallelise at the tox environment level, e..g, somehow running the multiple tox environments (py27,py27-cython,py35,...) in parallel. > 2. Parallelise on Gradle project level (--parallel option) > 3. Parallelise on CI level > > So what I ve done before, if 1. does not help, nor 2. cause project is > 'just to big' was 3, i.e. splitting on CI level. So the simplest thing I > could imagine right now would be - as suggested above - to split > pythonPreCommit into something like pythonX_YPrecommit, which then runs > those different python versions in parallel. Of course, that could be done > ad infinitum by splitting further into runners, IOs whatever. > > OTOH, we do already have tons of jobs running as we seem to map Gradle > tasks to Jenkins jobs. So it might be more appropriate to leverage CI > parallelization on jenkins pipeline level. Something like creating > pythonPrecommit as a pipeline, which itself runs several steps in parallel. > Did not contemplate on the impacts, though, on PRs and phrase triggering. > Of course, we might need to improve on our 'verbosity' as currently it > seems not always clear what command line actually triggered etc. With this > approach, it seems possible to run tasks within same Gradle project in > parallel with whatever granularity we might want to choose. (Of course we > do add overhead here, but that's probably always the case by doing things > in parallel) > > Of course 3. would not help on parallelisation on developers machine, but > I never had a problem with that. > > >> >> > And as a last thought, will usage of pygradle help with better >> python/gradle integration? Currently, we mainly use gradle to call into >> shell scripts, which doesn't help gradle nor probably pythons tooling to do >> the job very well? But deeper integration might cause problems on python >> dev side, dunno :( >> >> Possibly. >> >> Are there any Python developers that primarily use the gradle >> commands? Personally, I only use them if I'm using Java (or sometimes >> work that is a mix of Java and Python, e.g. the Python-on-Flink >> tests). Otherwise I use tox, or "python setup.py test [-s ...]" >> directly. Gradle primarily has value as a top-level orchestration (in >> particular for CI) and easy way for those who only touch Python >> occasionally to run all the tests. If that's the case, optimizing our >> gradle scripts for CI seems best. >> > > My impression till today is, that nobody is actually using Gradle on daily > python development. So I assume it is currently used on CI only. But I > believe gradle could add value for devs also. We just need to improve our > current integration. And to be very clear, gradle should not replace python > tooling, but support usage in a way which looks familiar to any python dev. > And probably render some shell scripting obsolete. > > >> >> > On Thu, Mar 28, 2019 at 6:37 PM Mark Liu <mark...@apache.org> wrote: >> >> >> >> Thank you Ahmet. Answer your questions below: >> >> >> >>> >> >>> - Could you comment on what kind of parallelization we will gain by >> this? In terms of real numbers, how would this affect build and test times? >> >> >> >> >> >> The proposal is based on Gradle parallel execution: "you can force >> Gradle to execute tasks in parallel as long as those tasks are in different >> projects". In Beam, project is declared per build.gradle file and >> registered in settings.gradle. Tasks that are included in single Gradle >> execution will run in parallel only if they are declared in separate >> build.gradle files. >> >> >> >> An example of applying parallel is beam_PreCommit_Python job which >> runs :pythonPreCommit task that contains tasks distributed in 4 >> build.gradle. The execution graph looks like >> https://scans.gradle.com/s/4frpmto6o7hto/timeline: >> >> >> >> Without this proposal, all tasks will run in sequential which can be >> ~2x longer. If more py36 and py37 tests added in the future, things will be >> even worse. >> >> >> >>> - I am guessing this will reduce complexity. Is it possible to >> quantify the improvement related to this? >> >> >> >> >> >> The general code complexity of function/method/property may not change >> here since we basically group tasks in a different way without changing >> inside logic. I don't know if there is any tool to measure Gradle build >> complexity. Would love to try if there is. >> >> >> >>> >> >>> - Beyond the proposal, I am assuming you are willing to work on. Just >> want to clarify this. In either case, would you need help? >> >> >> >> >> >> Yes, I'd love to take on major refactor works. At the same time, I'll >> create jira for each kind of tests (like flink/protable/hdfs tests) in >> sdks/python/build.gradle to move into test-suites. Test owners or anyone >> interested to this work are welcome to contribute! >> >> >> >> Mark >> >> >> >> On Wed, Mar 27, 2019 at 3:53 PM Ahmet Altay <al...@google.com> wrote: >> >>> >> >>> This sounds good to me. Thank you for doing this. Few questions: >> >>> - Could you comment on what kind of parallelization we will gain by >> this? In terms of real numbers, how would this affect build and test times? >> >>> - I am guessing this will reduce complexity. Is it possible to >> quantify the improvement related to this? >> >>> - Beyond the proposal, I am assuming you are willing to work on. Just >> want to clarify this. In either case, would you need help? >> >>> >> >>> Thank you, >> >>> Ahmet >> >>> >> >>> On Wed, Mar 27, 2019 at 10:19 AM Mark Liu <mark...@apache.org> wrote: >> >>>> >> >>>> Hi Python SDK Developers, >> >>>> >> >>>> You may notice that Gradle files changed a lot recently as >> parallelization applied to Python tests and more python versions were >> enabled in testing. There are tricks over the build scripts and tests are >> grown naturally and distributed under sdks/python, which caused frictions >> (like rollback PR-8059). >> >>>> >> >>>> Thus, I created BEAM-6907 and would like to initiate some works to >> cleanup and standardize Gradle structure in Python SDK. In general, I think >> we want to: >> >>>> >> >>>> - Apply parallel execution >> >>>> - Share common tasks >> >>>> - Centralize test related tasks >> >>>> - Have a clear Gradle structure for projects/tasks >> >>>> >> >>>> This is Gradle directory structure I proposed: >> >>>> >> >>>> sdks/python/ >> >>>> >> >>>> build.gradle --> hold builds, snapshot, analytic tasks >> >>>> test-suites/ --> all pre/post/VR test suites under here >> >>>> >> >>>> README.md >> >>>> >> >>>> dataflow/ --> grouped by runner or unit test (tox) >> >>>> >> >>>> py27/ --> grouped by py version >> >>>> >> >>>> build.gradle >> >>>> >> >>>> py35/ >> >>>> >> >>>> ... >> >>>> >> >>>> direct/ >> >>>> >> >>>> py27/ >> >>>> >> >>>> ... >> >>>> >> >>>> flink/ >> >>>> >> >>>> tox/ >> >>>> ... >> >>>> >> >>>> >> >>>> The ideas are: >> >>>> - Only keep builds, snapshot and analytic jobs in >> sdks/python/build.gradle >> >>>> - Move all test related tasks to sdks/python/test-suites/ >> >>>> - In sdks/python/test-suites, we first group by runners, unit test >> or other testing that can't fit to them, and then group by py versions if >> needed. >> >>>> - An example of ../test-suites/../py35/build.gradle is this. >> >>>> >> >>>> Please feel free to explore existing Gradle scripts in Python SDK >> and bring any thoughts on this proposal if you have. >> >>>> >> >>>> Thanks! >> >>>> Mark >> >
smime.p7s
Description: S/MIME Cryptographic Signature