On 31 October 2017 at 05:16, RonnyPfannschmidt < opensou...@ronnypfannschmidt.de> wrote:
> Hi everyone, > > since a while now various details of installing python packages in > virtualenvs caused me grief > > a) typically each tox folder in a project is massive, and has a lot of > duplicate files, recreating them, managing and iterating them takes > quite a while > b) for nicely separated deployments, each virtualenv for an application > takes a few hundred megabytes - that quickly can saturate disk space > even if a reasonable amount was reserved > c) installation and recreation of virtualenvs with the same set of > packages takes quite a while (even with pip caches this is slow, and > there is no good reason to avoid making it completely instantaneous) > > in order to elevate those issues i would like to propose a new > installation layout, > where instead of storing each distribution in every python all > distributions would share a storage, and each individual environment > would only have references to the packages that where > "installed/activated" for them > I've spent a fair bit of time pondering this problem (since distros care about it in relation to ease of security updates), and the combination of Python's import semantics with the PEP 376 installation database semantics makes it fairly tricky to improve. Fortunately, the pth-file mechanism provides an escape hatch that makes it possible to transparently experiment with difference approaches. At the venv management layer, pew already supports a model similar to that offered by the Flatpak application container format [1]: instead of attempting to share everything, pew permits a limited form of "virtual environment inheritance", via "pew add $(pew dir <named-venv-to-depend-on>)" (which injects a *.pth file that appends the other venv's site-packages directory to sys.path). Those inherited runtimes then become the equivalent of the runtime layer in Flatpak: applications will automatically pick up new versions of the runtime, so the runtime maintainers are expected to strictly preserve backwards compatibility, and when that isn't possible, provide a new parallel-installable version, so apps using both the old and the new runtime can happily run side-by-side. The idea behind that approach is to trade-off a bit of inflexibility in the exact versions of some of your dependencies for the benefit of a reduction in data duplication on systems running multiple applications or environments: instead of specifying your full dependency set, you'd instead only specify that you depended on a particular common computational environment being available, plus whatever you needed that isn't part of the assumed platform. As semi-isolated-applications-with-a-shared-runtime mechanisms like Flatpak gain popularity (vs fully isolated application & service silos), I'd expect this model to start making more of an appearance in the Linux distro world, as it's a natural way of mapping per-application venvs to the shared runtime model, and it doesn't require any changes to installers or applications to support it. However, there's another approach that specifically tackles the content duplication problem, which would require a new installation layout as you suggest, but could still rely on *.pth files to make it implicitly compatible with existing packages and applications and existing Python runtime versions. That approach is to create an install tree somewhere that looks like this: _shared-packages/ <normalised-package-name>/ <release-version>/ <version-details>.dist-info/ <installed-files> Instead of installing full packages directly into a venv the way pip does, an installer that worked this way would instead manage a <normalised-package-name>.pth file that indicated "_shared-packages/<normalised-package-name>/<release-version>" should be added to sys.path. Each shared package directory could include references back to all of the venvs where it has been installed, allowing it to be removed when either all of those have been updated to a new version, or else removed entirely. This is actually a *lot* like the way pkg_resources.requires() and self-contained egg directories work, but with the version selection shifted to the venv's site-packages directory, rather than happening implicitly in Python code on application startup. An interesting point about this layout is that it would be amenable to a future enhancement that allowed for more relaxed MAJOR and MAJOR.MINOR qualifiers on the install directory references, permitting transparently shared maintenance and security updates. The big downside of this layout is that it means you lose the ability to just bundle up an entire directory and unpack it on a different machine to get a probably-mostly-working environment. This means that while it's likely better for managing lots of environments on a single workstation (due to the reduced file duplication), it's likely to be worse for folks that work on only a handful of different projects at any given point in time (and I say that as someone with ~140 different local repository clones across my ~/devel, ~/fedoradevel and ~/rhdevel directories). Cheers, Nick. [1] http://docs.flatpak.org/en/latest/introduction.html#how-it-works -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig