Hi all, I had a wacky idea and I can't tell if it's brilliant, or ridiculous, or both. Which makes sense given that I had a temperature of 39.3 when I thought of it, but even after getting better and turning it over in my mind for a while I still can't tell, so, I figured I'd put it out there and see what y'all thought :-).
The initial motivation was to try and clean up an apparent infelicity in how extras work. After poking at it for a bit, I realized that it might accidentally solve a bunch of other problems, including removing the need for dynamic metadata in sdists (!). If it can be made to work at all... Motivating problem ------------------ "Extras" are pretty handy: e.g., if you just want the basic ipython REPL, you can do ``pip install ipython``, but if you want the fancy HTML notebook interface that's implemented in the same source code base but has lots of extra heavy dependencies, then you can do ``pip install ipython[notebook]`` instead. Currently, extras are implemented via dedicated metadata fields inside each package -- when you ``pip install ipython[notebook]``, it downloads ipython-XX.whl, and when it reads the metadata about this package's dependencies and entry points, the ``notebook`` tag flips on some extra dependencies and extra entry points. This worked pretty well for ``easy_install``'s purposes, since ``easy_install`` mostly cared about installing things; there was no concern for upgrading. But people do want to upgrade, and as ``pip`` gets better at this, extras are going to start causing problems. Hopefully soon, pip will get a proper resolver, and then an ``upgrade-all`` command like other package managers have. Once this happens, it's entirely possible -- indeed, a certainty in the long run -- that if you do:: $ pip install ipython[notebook] # wait a week $ pip upgrade-all you will no longer have the notebook installed, because the new version of ipython added a new dependency to the "notebook" extra, and since there's no record that you ever installed that extra, this new dependency won't be installed when you upgrade. I'm not sure what happens to any entry points or scripts that were part of ipython[notebook] but not ipython -- I'm guessing they'd still be present, but broken? If you want to upgrade while keeping the notebook around, then ``upgrade-all`` is useless to you; you have to manually keep a list of all packages-with-extras you have installed and explicitly pass them to the upgrade command every time. Which is terrible UX. Supporting extras in this manner also ramifies complexity through the system: e.g., the specification of entry points becomes more complex because you need a way to make them conditional on particular extras, PEP 426 proposes special mechanisms to allow package A to declare a dependency on extra B of package C, etc. And extras also have minor but annoying limitations, e.g. there's no mechanism provided to store a proper description of what an extra provides and why you might want it. Solution (?): promoting extras to first-class packages ------------------------------------------------------ There's an obvious possible solution, inspired by how other systems (e.g. Debian) handle this situation: promote ``ipython[notebook]`` to a full-fledged package, that happens to contain no files of its own, but which gets its own dependencies and other metadata. What would this package look like? Something like:: Name: ipython[notebook] Version: 4.0.0 Requires-Dist: ipython (= 4.0.0) Requires-Dist: extra_dependency_1 Requires-Dist: extra_dependency_2 Requires-Dist: ... The ``notebook`` extra extends IPython with an HTML interface to... Installing it needs to automatically trigger the installation of ipython, so it should depend on ipython. It needs to be upgraded in sync with ``ipython``, so this dependency should be an exact version dependency -- that way, upgrading ``ipython`` will (once we have a real resolver!) force an upgrade of ``ipython[notebook]`` and vice-versa. Then of course we also need to include the extra's unique dependencies, and whatever else we want (e.g. a description). What would need to happen to get there from here? AFAICT a relatively small set of changes would actually suffice: **PyPI:** starts allowing the upload of wheels named like ``BASE[EXTRA]-VERSION-COMPAT.whl``. They get special handling, though: who-ever owns ``BASE`` gets to do whatever they like with names like ``BASE[EXTRA]``, that's an established rule, so wheels following this naming scheme would be treated like other artifacts associated with the (BASE, VERSION) release. In particular, the uploader would need to have write permission to the ``BASE`` name, and it would remain impossible to register top-level distribution names containing square brackets. **setuptools:** Continues to provide "extra" metadata inside the METADATA file just as it does now (for backwards compatibility with old versions of pip that encounter new packages). In addition, though, the egg-info command would starts generating .egg-info directories for each defined extra (according to the schema described above), the bdist_wheel command would start generating a wheel file for each defined extra, etc. **pip:** Uses a new and different mechanism for looking up packages with extras: - when asked to fulfill a requirement for ``BASE[EXTRA1,EXTRA2,...] (> X)``, it should expand this to ``BASE[EXTRA1] (> X), BASE[EXTRA2] (> X), ...``, and then attempt to find wheels with those actual names - backcompat case: if we fail to find a BASE[EXTRA] wheel, then fall back to fetching a wheel named BASE and attempt to use the "extra" metadata inside it to generate BASE[EXTRA], and install this (this is morally similar to the fallback logic where if it can't find foo.whl it tries to generate it from the foo.zip sdist) - Optionally, PyPI itself could auto-generate these wheels for legacy versions (since they can be generated automatically from static wheel metadata), thus guaranteeing that this path would never be needed, and then pip could disable this fallback path... but I guess it would still need it to handle non-PyPI indexes. - if this fails, then it falls back to fetching an sdist named BASE (*not* BASE[EXTRA]) and attempting to build it (while making sure to inject a version of setuptools that's recent enough to include the above changes). **PEP 426:** can delete all the special case stuff for extras, because they are no longer special cases, and there's no backcompat needed for a format that is not yet in use. **twine and other workflows:** ``twine upload dist/*`` continues to do the right thing (now including the new extra wheels). Other workflows might need the obvious tweaking to include the new wheel files. So this seems surprisingly achievable (except for the obvious glaring problem that I missed but someone is about to point out?), would improve correctness in the face of upgrades, and simplifies our conceptual models, and provides a more solid basis for future improvements (e.g. if in the future we add better tracking of which packages were manually installed, then this will automatically apply to extras as well, since they are just packages). But wait there's more --------------------- **Non-trivial plugins:** Once we've done this, suddenly extras become much more powerful. Right now it's a hard constraint that extras can only add new dependencies and entry points, not contain any code. But this is rather artificial -- from the user's point of view, 'foo[bar]' just means 'I want a version of foo that has the bar feature', they don't care whether this requires installing some extra foo-to-bar shim code. With extras as first-class packages, it becomes possible to use this naming scheme for things like plugins or compiled extensions that add actual code. **Build variants:** People keep asking for us to provide numpy builds against Intel's super-fancy closed-source (but freely redistributable) math library, MKL. It will never be the case that 'pip install numpy' will automatically give you the MKL-ified version, because see above re: "closed source". But we could provide a numpy[mkl]-{VERSION}-{COMPAT}.whl with metadata like:: Name: numpy[mkl] Conflicts: numpy Provides: numpy which acts as a drop-in replacement for the regular numpy for those who explicitly request it via ``pip install numpy[mkl]``. This involves two new concepts on top of the ones above: Conflicts: is missing from the current metadata standards but (I think?) trivial to implement in any real resolver. It means "I can't be installed at the same time as something else which matches this requirement". In a sense, it's actually an even more primitive concept than a versioned requirement -- Requires: foo (> 2.0) is equivalent to Requires: foo + Conflicts: foo (<= 2.0), but there's no way to expand an arbitrary Conflicts in terms of Requires. (A minor but important wrinkle: the word "else" is important there; you need a special case saying that a package never conflicts with itself. But I think that's the only tricky bit.) Provides: is trickier -- there's some vestigial support in the current standards and even in pip, but AFAICT it hasn't really been worked out properly. The semantics are obvious enough (Provides: numpy means that this package counts as being numpy; there's some subtleties around what version of numpy it should count as but I think that can be worked out), but it opens a can of worms, because you don't want to allow things like:: Name: numpy Provides: django But once you have the concept of a namespace for multiple distributions from the same project, then you can limit Provides: so that it's only legal if the provider distribution and the provided distribution have the same BASE. This solves the social problem (PyPI knows that numpy[mkl] and numpy are 'owned' by the same people, so this Provides: is OK), and provides algorithmic benefits (if you're trying to find some package that provides foo[extra] out of a flat directory of random distributions, then you only have to examine wheels and sdists that have BASE=foo). The other advantage to having the package be ``numpy[mkl]`` instead of ``numpy-mkl`` is that it correctly encodes that the sdist is ``numpy.zip``, not ``numpy-mkl.zip`` -- the rules we derived to match how extras work now are actually exactly what we want here too. **ABI tracking:** This also solves another use case entirely: the numpy ABI tracking problem (which is probably the single #1 problem the numerical crowd has with current packaging, because it actually prevents us making basic improvements to our code -- the reason I've been making a fuss about other things first is that until now I couldn't figure out any tractable way to solve this problem, but now I have hope). Once you have Provides: and a namespace to use with it, then you can immediately start using "pure virtual" packages to keep track of which ABIs are provided by a single distribution, and determine that these two packages are consistent: Name: numpy Version: 1.9.2 Provides: numpy[abi-2] Provides: numpy[abi-3] Name: scipy Depends: numpy Depends: numpy[abi-2] (AFAICT this would actually make pip *better* than conda as far as numpy's needs are concerned.) The build variants and virtual packages bits also work neatly together. If SciPy wants to provide builds against multiple versions of numpy during the transition period between two ABIs, then these are build variants exactly like numpy[mkl]. For their 0.17.0 release they can upload:: scipy-0.17.0.zip scipy[numpy-abi-2]-0.17.0.whl scipy[numpy-abi-3]-0.17.0.whl (And again, it would be ridiculous to have to register scipy-numpy-abi-2, scipy-numpy-abi-3, etc. on PyPI, and upload separate sdists for each of them. Note that there's nothing magical about the names -- those are just arbitrary tags chosen by the project; what pip would care about is that one of the wheels' metadata says Requires-Dist: numpy[abi-2] and the other says Requires-Dist: numpy[abi-3].) So far as I can tell, these build variant cases therefore cover *all of the situations that were discussed in the previous thread* as reasons why we can't necessarily provide static metadata for an sdist. The numpy sdist can't statically declare a single set of install dependencies for the resulting wheel... but it could give you a menu, and say that it knows how to build numpy.whl, numpy[mkl].whl, or numpy[external-blas].whl, and tell you what the dependencies will be in each case. (And maybe it's also possible to make numpy[custom] by manually editing some configuration file or whatever, but pip would never be called upon to do this so it doesn't need the static metadata.) So I think this would be sufficient to let us start providing full static metadata inside sdists? (Concretely, I imagine that the way this would work is that when we define the new sdist hooks, one of the arguments that pip would pass in when running the build system would be a list of the extras that it's hoping to see, e.g. "the user asked for numpy[mkl], please configure yourself accordingly". For legacy setuptools builds that just use traditional extras, this could safely be ignored.) TL;DR ----- If we: - implement a real resolver, and - add a notion of a per-project namespace of distribution names, that are collected under the same PyPI registration and come from the same sdist, and - add Conflicts:, and Provides:, then we can elegantly solve a collection of important and difficult problems, and we can retroactively pun the old extras system onto the new system in a way that preserves 100% compatibility with all existing packages. I think? What do you think? -n -- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig