Hello, Python team and other nice people. I'd like to discuss the topic of parallel runs again. While it sounded like a good idea at first, I have my doubts now. I'll try to shortly describe the implementation, then recollect the advantages and disadvantages of it.
Implementation -------------- Right now, parallel runs are feature of multibuild.eclass which in turn uses multiprocessing.eclass. It's some hacky implementation in bash but it works. However, calling 'die' inside such implementation is illegal per PMS and the Council is not interested in changing that even though we're doing that a lot :). The parallel support is implemented in python-r1 through python_parallel_foreach_impl function. This function in turn is used to implement parallel running of sub-phases in distutils-r1. Adv. and disadv. of parallel phases now --------------------------------------- Advantages: - speedup of non-parallel build tasks -- compiling Python modules, extensions (before Python 3.4 [?]), running 2to3. The latter uses to take a lot of CPU time while utilizing only one core on a modern CPU. Running it in parallel for few impls makes it possible to utilize full power of the CPU. - speedup of PyPy phase runs -- PyPy and PyPy3 take quite long to start. By spawning their phases first and in parallel to CPython runs, we can speed the build up a bit. The idea is that implementations that usually take longer to build are spawned first so that the machine is kept multi-core busy as long as possible. - finding of silly assumptions in build systems -- we have a lot of build systems that write in random locations and expect files not to be touched by anything else. Disadvantages: - conflict with parallel parts of build -- I think Python 3.4's distutils is capable of building extensions in parallel [can we backport that?]. The same goes for nosetests and possibly some other stuff. - possibility of high resource usage -- this especially applies to tests which aren't made with assumption that someone will be running, say, 4 instances of them in parallel. - necessity of fighting build system bugs -- it's rather common that tests and builds write to files in sourcedir or tempdir without proper unique naming. Long story short, we need to workaround that stuff a lot to get the tests not to fail randomly, and the build to install correct files (and e.g. not mix implementations). - some developers are surprised that variables set inside sub-phases are not preserved in global scope (due to subshell). What if we disabled it? ----------------------- Advantages: - the eclass becomes a small bit simpler, and loses the dependency on multiprocessing (well, it will still be inherited implicitly but not used). - developers no longer have to fix all the upstream build system failures. - resource-consuming and parallel parts of build no longer have to be hacked to avoid issues with multiprocessing. - we comply to PMS again. Disadvantages: - 2to3 and pure Python module build/install steps will be noticeably slower and less efficient (esp. noticeable for PyPy and PyPy3). - some ebuilds may have to be modified because developers assumed that changes (global vars, working directories) from within sub-phase will not affect the successive phases. What are your thoughts? -- Best regards, Michał Górny
signature.asc
Description: PGP signature
