On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dho...@gmail.com> wrote:
> The drawback of .zip is file size since it compresses each file > individually rather than giving the compression algorithm a larger input, > it's a great format otherwise. Ubiquitous including Apple iOS packages, > Java, word processor file formats. And most Python packages are small. > I don't really buy the indexing advantages, especially w/ the current implementation of zipfile in python (e.g. loading the whole set of archives at creation time) A common way to solve the fast metadata access from archive is to archive separately the metadata data and data (e.g. a zipfile containing 2 zipfiles, one being the original sdist, the other one containing the metadata). David > We must do the hard work to support Unicode file names, and spaces and > accent marks in home directory names (historically a problem on Windows), > in our packaging system. It is the right thing to do. It is not the > publisher's fault that your system has broken Unicode. > > On Tue, Oct 27, 2015 at 6:43 AM Paul Moore <p.f.mo...@gmail.com> wrote: > >> On 26 October 2015 at 06:04, Nathaniel Smith <n...@pobox.com> wrote: >> > Here's a second round of text towards making a build-system >> > independent interface between pip and source trees/sdists. My idea >> > this time is to take a divide-and-conquer approach: this text tries to >> > summarize all the stuff that it seemed like we had mostly reached >> > consensus on in the previous thread + call, with blank chunks marked >> > "TBD" where there are specific points that still need To Be >> > Determined. So my hope is that everyone will read what's here and >> > agree that it's great as far as it goes, and then we can go through >> > and fill in each missing piece one at a time. >> >> I'll comment on what's here, but ignore the TBD items - I'd rather (as >> you suggest) leave discussion of those details till the basic idea is >> agreed. >> >> > Abstract >> > ======== >> > >> > Distutils delenda est. >> >> While this makes a nice tagline, I'd rather something less negative. >> Distutils does not "need" to be destroyed. It's perfectly adequate >> (although hardly user friendly) for a lot of cases - I'd be willing to >> suggest *most* users can work just fine with distutils. >> >> I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric >> limited - as Nick pointed out this whole area is as much a political >> issue as a technical one. >> >> > Extended abstract >> > ================= >> > >> > While ``distutils`` / ``setuptools`` have taken us a long way, they >> > suffer from three serious problems: (a) they're missing important >> > features like autoconfiguration and usable build-time dependency >> > declaration, (b) extending them is quirky, complicated, and fragile, >> > (c) it's very difficult to use anything else, because they provide the >> > standard interface for installing python packages expected by both >> > users and installation tools like ``pip``. >> >> Again, this is overstated. You very nearly lost me right here - people >> won't read the details of the proposal if they disagree with the >> abstract(s). Specifically: >> >> * The features in (a) are only important to *some* parts of the >> community. The scientific community is the major one, and is a huge >> influence over the direction we want to go in, but again, not crucial >> to many people. And even where they might be useful (e.g., Windows >> users building pyyaml, lxml, pillow, ...) the description implies >> "working out what's there" rather than "allowing users to easily >> manage non-Python dependencies", which gives the wrong impression. >> >> * The features in (b) are highly specialised. Very few people extend >> setuptools/distutils. And those who do, have often invested a lot of >> effort in doing so. Sure, they'd rather not have needed to, but now >> that they have, a replacement system simply means that work is lost. >> Arguably, fixing (b) is only useful for people (like the scientific >> community) who have needed to extend setuptools and have been unable >> to achieve their goals that way. That's an even smaller part of the >> community. >> >> > Previous efforts (e.g. distutils2 or setuptools itself) have attempted >> > to solve problems (a) and/or (b). We propose to solve (c). >> >> Agreed - this is a good approach. But it's at odds with your abstract, >> which says distutils must die. Here you're saying you want to allow >> people to keep using distutils but allow people with specialised needs >> to choose an alternative. Or are you offering an alternative to people >> who use distutils? >> >> The whole of the above is confusing on the face of it. The details >> below clarify a lot, as does knowing how the previous discussions have >> gone. But it would help a lot if the introduction to this PEP were >> clearer. >> >> > The goal of this PEP is get distutils-sig out of the business of being >> > a gatekeeper for Python build systems. If you want to use distutils, >> > great; if you want to use something else, then that should be easy to >> > do using standardized methods. The difficulty of interfacing with >> > distutils means that there aren't many such systems right now, but to >> > give a sense of what we're thinking about see `flit >> > <https://github.com/takluyver/flit>`_ or `bento >> > <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now >> > solved many of the hard problems here -- e.g. it's no longer necessary >> > that a build system also know about every possible installation >> > configuration -- so pretty much all we really need from a build system >> > is that it have some way to spit out standard-compliant wheels. >> >> OK. Although I see a risk here that if I want to build package FOO, I >> now have to worry whether FOO's build system supports Windows, as well >> as worrying whether FOO itself supports Windows. >> >> There's still a role for some "gatekeeper" (not a good word IMO, maybe >> "coordinator") to provide a certain level of support or review of >> build systems, and a point of contact for users with build issues (the >> point of this proposal is to some extent that people don't need to >> *know* what build system a project uses, so suggesting everyone has to >> direct issues to the correct build system support forum isn't >> necessarily practical). >> >> > We therefore propose a new, relatively minimal interface for >> > installation tools like ``pip`` to interact with package source trees >> > and source distributions. >> > >> > In addition, we propose a wheel-inspired static metadata format for >> > sdists, suitable for tools like PyPI and pip's resolver. >> > >> > >> > Terminology and goals >> > ===================== >> > >> > A *source tree* is something like a VCS checkout. We need a standard >> > interface for installing from this format, to support usages like >> > ``pip install some-directory/``. >> > >> > A *source distribution* is a static snapshot representing a particular >> > release of some source code, like ``lxml-3.4.4.zip``. Source >> > distributions serve many purposes: they form an archival record of >> > releases, they provide a stupid-simple de facto standard for tools >> > that want to ingest and process large corpora of code, possibly >> > written in many languages (e.g. code search), they act as the input to >> > downstream packaging systems like Debian/Fedora/Conda/..., and so >> > forth. In the Python ecosystem they additionally have a particularly >> > important role to play, because packaging tools like ``pip`` are able >> > to use source distributions to fulfill binary dependencies, e.g. if >> > there is a distribution ``foo.whl`` which declares a dependency on >> > ``bar``, then we need to support the case where ``pip install bar`` or >> > ``pip install foo`` automatically locates the sdist for ``bar``, >> > downloads it, builds it, and installs the resulting package. >> >> This is somewhat misleading, given that you go on to specify the >> format below, but maybe that's only an issue for someone like me who >> saw the previous debate over "source distribution" (as a bundled up >> source tree) vs "sdist" as a specified format. If I understand, you've >> now discarded the former sense of source distribution, and are >> sticking with the latter (specified format) definition. >> >> > Source distributions are also known as "sdists" for short. >> > >> > >> > Source trees >> > ============ >> > >> > We retroactively declare the legacy source tree format involving >> > ``setup.py`` to be "version 0". We don't try to specify it further; >> > its de facto specification is encoded in the source code and >> > documentation of ``distutils``, ``setuptools``, ``pip``, and other >> > tools. >> > >> > A "version 1" (or greater) source tree is any directory which contains >> > a file named ``pypackage.cfg``, which will -- in some manner whose >> > details are TBD -- describe the package's build dependencies and how >> > to invoke the build system. This mechanism: >> > >> > - Will allow for both static and dynamic specification of build >> dependencies >> > >> > - Will have some degree of isolation of different builds from each >> > other, so that it will be possible for a single run of pip to install >> > one package that build-depends on ``foo = 1.1`` and another package >> > that build-depends on ``foo = 1.2``. >> >> All good so far. >> >> > - Will leave the actual installation of the package in the hands of >> > the build/installation tool (i.e. individual package build systems >> > will not need to know about things like --user versus --global or make >> > decisions about when and how to modify .pth files) >> >> This seems completely backwards to me. It's pip's job to do the actual >> install. The build tool should *only* focus on generating standard >> conforming binary wheels - otherwise what's the point of the >> separation of concerns that wheels provide? >> >> Or maybe I'm confused by the term "build/installation tool" - by that >> did you actually mean pip, rather than the build system? >> >> (TBDs omitted) >> >> > Source distributions >> > ==================== >> > >> > [possibly this should get split off into a separate PEP, but I'll keep >> > it together for now for ease of discussion] >> > >> > A "version 1" (or greater) source distribution is a file meeting the >> > following criteria: >> > >> > - It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where >> > {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant >> > version number, and {EXT} is a compliant archive format. >> > >> > The set of compliant archive formats is: zip, [TBD] >> > >> > [QUESTION: should we continue to allow .tar.gz and friends? In >> > practice by "allow" I mean something like "accept new-style sdists on >> > PyPI in this format". I'm inclined not to -- zip is the most >> > universally supported format around, it allows file-based random >> > access (unlike tar-based things) which is useful for pulling out >> > metadata without decompressing the whole thing, and standardizing on >> > one format dodges distracting and pointless discussions about which >> > format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to >> > continue to support other archive formats when passed explicitly on >> > the command line. Any objections?] >> >> +1 on having a single archive format, and zip seems like the best choice. >> >> > Similar to wheels, the archive is Unicode, and the filenames inside >> > the archive are encoded in UTF-8. >> >> This isn't the job of the sdist format to specify. It should be >> implicit in the choice of archive format. >> >> Having said that, I'd go with >> >> 1. The sdist filename MUST support the full range of package names as >> specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) >> and versions as in PEP 440 >> (https://www.python.org/dev/peps/pep-0440/). That's actually far less >> than full Unicode. >> 2. The archive format MUST support arbitrary Unicode filenames. That >> means zip is OK, but tar.gz isn't unless you specify UTF-8 is used >> (the tar format doesn't allow for an encoding declaration - see >> https://docs.python.org/3.5/library/tarfile.html#tar-unicode for >> details on Unicode issues in the tar format). >> >> Having said that I'd also go with "filenames in the archive SHOULD be >> limited to ASCII" - because we have had issues with pip where test >> files have Unicode filenames, and builds break because they get >> mangled on systems with weird encoding setups... IIRC, these are >> typically related to .tar.gz sdists, which (due to the lack of >> encoding support) result in files being unpacked with the wrong names. >> So maybe if we enforce zip format we don't need to add this >> limitation. >> >> > - When unpacked, it MUST contain a single directory directory tree >> > named ``{PACKAGE}-{VERSION}``. >> > >> > - This directory tree MUST be a valid version 1 (or greater) source >> > tree as defined above. >> > >> > - It MUST additionally contain a directory named >> > ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the >> > following contents: >> > >> > - ``SDIST``: Mandatory. Same record-oriented format as a wheel's >> > ``WHEEL`` file, but with different fields:: >> > >> > SDist-Version: 1.0 >> > Generator: setuptools sdist 20.1 >> > >> > ``SDist-Version`` is the version number of this specification. >> > Software that processes sdists should warn if ``SDist-Version`` is >> > greater than the version it supports, and must fail if >> > ``SDist-Version`` has a greater major version than the version it >> > supports. >> > >> > ``Generator`` is the name and optionally the version of the >> > software that produced the archive. >> > >> > - ``RECORD``: Mandatory. A list of all files contained in the sdist >> > (except for the RECORD file itself and any signature files) together >> > with their hashes, as specified in PEP 427. >> > >> > - ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as >> > specified in PEP 427. >> > >> > - ``METADATA``: Mandatory. Metadata version 1.1 or greater format >> > metadata, with an additional rule that fields may contain the special >> > sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value >> > of this field cannot be determined until build time. If a "multiple >> > use field" is present with the value ``__SDIST_DYNAMIC__``, then this >> > field MUST occur exactly once, e.g.:: >> > >> > # Okay: >> > Requires-Dist: lxml (> 3.3) >> > Requires-Dist: requests >> > >> > # no Requires-Dist lines at all is okay >> > # (meaning: this package's requirements are the empty set) >> > >> > # Okay, requirements will be determined at build time: >> > Requires-Dist: __SDIST_DYNAMIC__ >> > >> > # NOT okay: >> > Requires-Dist: lxml (> 3.3) >> > Requires-Dist: __SDIST_DYNAMIC__ >> > >> > (The use of a special token allows us to distinguish between >> > multiple use fields whose value is statically the empty list versus >> > one whose value is dynamic; it also allows us to distinguish between >> > optional fields which are statically not present versus ones whose >> > value is dynamic.) >> > >> > When this sdist is built, the resulting wheel MUST have metadata >> > which is identical to the metadata present in this file, except that >> > any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have >> > arbitrary values in the wheel. >> > >> > A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for >> > the package name or version (i.e., these must be given statically), >> > and these MUST match the {PACKAGE} and {VERSION} of the sdist as >> > described above. >> >> This seems pretty good at first reading. >> >> > [TBD: do we want to forbid the use of dynamic metadata for any >> > other fields? I assume PyPI will enforce some stricter rules at least, >> > but I don't know if we want to make that part of the spec, or just >> > part of PyPI's administrative rules.] >> >> This covers the main point of contention. It would be bad if build >> systems started using __SDIST_DYNAMIC__ just because "it's easier". >> >> Maybe add >> >> * A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any >> more than necessary (i.e., if the metadata is the same in all >> generated wheels, it does not need to use the __SDIST_DYNAMIC__ >> mechanism, and so should not do so). >> >> > This is intentionally a close analogue of a wheel's ``.dist-info`` >> > directory; intention is that as future metadata standards are defined, >> > the specifications for the ``.sdist-info`` and ``.dist-info`` >> > directories will evolve in synchrony. >> > >> > >> > Evolutionary notes >> > ================== >> > >> > A goal here is to make it as simple as possible to convert old-style >> > sdists to new-style sdists. (E.g., this is one motivation for >> > supporting dynamic build requirements.) The ideal would be that there >> > would be a single static pypackage.cfg that could be dropped into any >> > "version 0" VCS checkout to convert it to the new shiny. This is >> > probably not 100% possible, but we can get close, and it's important >> > to keep track of how close we are... hence this section. >> > >> > A rough plan would be: Create a build system package >> > (``setuptools_pypackage`` or whatever) that knows how to speak >> > whatever hook language we come up with, and convert them into >> > setuptools calls. This will probably require some sort of hooking or >> > monkeypatching to setuptools to provide a way to extract the >> > ``setup_requires=`` argument when needed, and to provide a new version >> > of the sdist command that generates the new-style format. This all >> > seems doable and sufficient for a large proportion of packages (though >> > obviously we'll want to prototype such a system before we finalize >> > anything here). (Alternatively, these changes could be made to >> > setuptools itself rather than going into a separate package.) >> > >> > But there remain two obstacles that mean we probably won't be able to >> > automatically upgrade packages to the new format: >> > >> > 1) There currently exist packages which insist on particular packages >> > being available in their environment before setup.py is executed. This >> > means that if we decide to execute build scripts in an isolated >> > virtualenv-like environment, then projects will need to check whether >> > they do this, and if so then when upgrading to the new system they >> > will have to start explicitly declaring these dependencies (either via >> > ``setup_requires=`` or via static declaration in ``pypackage.cfg``). >> > >> > 2) There currently exist packages which do not declare consistent >> > metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different >> > ``install_requires=``). When upgrading to the new system, projects >> > will have to evaluate whether this applies to them, and if so they >> > will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` >> > annotations at appropriate places. >> > >> > We'll also presumably need some API for packages to describe which >> > parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for >> > the packages that need it (a new argument to ``setup()`` or some >> > setting in ``setup.cfg`` or something). >> >> I'm confused here. And it's just now become clear *why* I'm confused. >> >> The sdist format MUST be a generated format - i.e., we should insist >> (in principle at least) that it's only ever generated by tools. >> Otherwise it's way too easy for people to just zip up their source >> tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) >> and say "here's an sdist". Obviously, people always *can* manually >> create an sdist but we need to pin down the spec tightly, or we've not >> improved things. >> >> That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what >> confuses me about the above transition plan. >> >> For people using setuptools currently, the transition should be simply >> that they upgrade setuptools, and the "setup.py sdist" command in the >> new setuptools generates the new sdist format. By default, the >> setuptools sdist process assumes everything is static and requires the >> user to modify the setup.py to explicitly mark which metadata they >> want to be left to build time. That way, we get a relatively >> transparent transition, while avoiding overuse of dynamic metadata. >> >> If setup.py has to explicitly mark dynamic metadata, that also allows >> us to reject attempts to make name and version dynamic. Which is good. >> >> Paul >> _______________________________________________ >> Distutils-SIG maillist - Distutils-SIG@python.org >> https://mail.python.org/mailman/listinfo/distutils-sig >> > > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig > >
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig