[Distutils] A possible refactor/streamlining of PEP 517

Nathaniel Smith Mon, 03 Jul 2017 02:06:08 -0700

Hi all,

I just attempted an experimental refactor/streamlining of PEP 517, to
match what I think it should look like :-). I haven't submitted it as
a PR to the PEPs repository yet since I don't know if others will
agree with the changes, but I've pasted the full text below, or you
can see the text online at:


    https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt

and the diff at:

    
https://github.com/python/peps/compare/master...njsmith:517-refactor-streamline?expand=1

Briefly, the changes are:

- Rearrange text into (hopefully) better signposted sections with
  better organization

- Clarify a number of details that have come up in discussion (e.g.,
  be more explicit that the hooks are run with the process working
  directory set to the source tree, and why)

- Drop prepare_wheel_metadata and prepare_wheel_build_files (for now);
  add detailed rationale for why we might want to add them back later.

- Add an "extensions" hook namespace to allow prototyping of future
  extensions.

- Rename get_build_*_requires -> get_requires_for_build_* to make the
  naming parallelism more obvious

- Add the option to declare an operation unsupported by returning
  NotImplemented

- Instead of defining a default value for get_requires_for_build_*,
  make it mandatory for get_require_for_build_* and build_* to appear
  together; this seems simpler now that we have multiple high-level
  operations defined in the same PEP, and also simplifies the
  definition of the NotImplemented semantics.

- Update title to better match the scope we ended up with

- Add a TODO to decide how to handle backends that don't want to have
  multiple hooks called from the same process, including some
  discussion of the options.

---------------------

PEP: 517
Title: Supporting non-setup.py-based build backends in pyproject.toml
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <n...@pobox.com>,
        Thomas Kluyver <tho...@kluyver.me.uk>
BDFL-Delegate: Nick Coghlan <ncogh...@gmail.com>
Discussions-To: <distutils-sig@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Sep-2015
Post-History: 1 Oct 2015, 25 Oct 2015, 1 July 2017


==========
 Abstract
==========

While ``distutils`` / ``setuptools`` have taken us a long way, they
suffer from three serious problems: (a) they're missing important
features like usable build-time dependency declaration,
autoconfiguration, and even basic ergonomic niceties like `DRY
<https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_-compliant
version number management, and (b) extending them is difficult, so
while there do exist various solutions to the above problems, they're
often quirky, fragile, and expensive to maintain, and yet (c) it's
very difficult to use anything else, because distutils/setuptools
provide the standard interface for installing packages expected by
both users and installation tools like ``pip``.

Previous efforts (e.g. distutils2 or setuptools itself) have attempted
to solve problems (a) and/or (b). This proposal aims to solve (c).

The goal of this PEP is get distutils-sig out of the business of being
a gatekeeper for Python build systems. If you want to use distutils,
great; if you want to use something else, then that should be easy to
do using standardized methods. The difficulty of interfacing with
distutils means that there aren't many such systems right now, but to
give a sense of what we're thinking about see `flit
<https://github.com/takluyver/flit>`_ or `bento
<https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
solved many of the hard problems here -- e.g. it's no longer necessary
that a build system also know about every possible installation
configuration -- so pretty much all we really need from a build system
is that it have some way to spit out standard-compliant wheels and
sdists.

We therefore propose a new, relatively minimal interface for
installation tools like ``pip`` to interact with package source trees
and source distributions.

==========================
 Reversion to Draft Status
==========================

While this PEP was provisionally accepted for implementation in `pip` and other
tools, some additional concerns were subsequently raised around adequately
supporting out of tree builds. It has been reverted to Draft status while those
concerns are being resolved.

=======================
 Terminology and goals
=======================

A *source tree* is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
``pip install some-directory/``.

A *source distribution* is a static snapshot representing a particular
release of some source code, like ``lxml-3.4.4.tar.gz``. Source
distributions serve many purposes: they form an archival record of
releases, they provide a stupid-simple de facto standard for tools
that want to ingest and process large corpora of code, possibly
written in many languages (e.g. code search), they act as the input to
downstream packaging systems like Debian/Fedora/Conda/..., and so
forth. In the Python ecosystem they additionally have a particularly
important role to play, because packaging tools like ``pip`` are able
to use source distributions to fulfill binary dependencies, e.g. if
there is a distribution ``foo.whl`` which declares a dependency on
``bar``, then we need to support the case where ``pip install bar`` or
``pip install foo`` automatically locates the sdist for ``bar``,
downloads it, builds it, and installs the resulting package.

Source distributions are also known as *sdists* for short.

A *build frontend* is a tool that users might run that takes arbitrary
source trees or source distributions and builds wheels from them. The
actual building is done by each source tree's *build backend*. In a
command like ``pip wheel some-directory/``, pip is acting as a build
frontend.

An *integration frontend* is a tool that users might run that takes a
set of package requirements (e.g. a requirements.txt file) and
attempts to update a working environment to satisfy those
requirements. This may require locating, building, and installing a
combination of wheels and sdists. In a command like ``pip install
lxml==2.4.0``, pip is acting as an integration frontend.


==============
 Source trees
==============

There is an existing, legacy source tree format involving
``setup.py``. We don't try to specify it further; its de facto
specification is encoded in the source code and documentation of
``distutils``, ``setuptools``, ``pip``, and other tools. We'll refer
to it as the ``setup.py``\-style.

Here we define a new style of source tree based around the
``pyproject.toml`` file defined in PEP 518, extending the
``[build-system]`` table in that file with one additional key,
``build-backend``. Here's an example of how it would look::

    [build-system]
    # Defined by PEP 518:
    requires = ["flit"]
    # Defined by this PEP:
    build-backend = "flit.api:main"

``build-backend`` is a string naming a Python object that will be
used to perform the build (see below for details). This is formatted
following the same ``module:object`` syntax as a ``setuptools`` entry
point. For instance, if the string is ``"flit.api:main"`` as in the
example above, this object would be looked up by executing the
equivalent of::

    import flit.api
    backend = flit.api.main

It's also legal to leave out the ``:object`` part, e.g. ::

    build-backend = "flit.api"

which acts like::

    import flit.api
    backend = flit.api

Formally, the string should satisfy this grammar::

    identifier = (letter | '_') (letter | '_' | digit)*
    module_path = identifier ('.' identifier)*
    object_path = identifier ('.' identifier)*
    entry_point = module_path (':' object_path)?

And we import ``module_path`` and then lookup
``module_path.object_path`` (or just ``module_path`` if
``object_path`` is missing).

If the ``pyproject.toml`` file is absent, or the ``build-backend``
key is missing, the source tree is not using this specification, and
tools should fall back to running ``setup.py``.

Where the ``build-backend`` key exists, it takes precedence over
``setup.py``, and source trees need not include ``setup.py`` at all.
Projects may still wish to include a ``setup.py`` for compatibility
with tools that do not use this spec.


=========================
 Build backend interface
=========================

The build backend object is expected to have callable attributes
called "hooks", which the build frontend can use to perform various
actions. The two high-level actions defined by this spec are creation
of an sdist (analogous to the legacy ``setup.py sdist`` command) and
building of a wheel (analogous to the legacy ``setup.py bdist_wheel``
command). We additionally define a namespace for tool-specific
hooks, which may be useful for prototyping future extensions to this
specification.


General rules for all hooks
---------------------------


Finding the source tree
~~~~~~~~~~~~~~~~~~~~~~~

All hooks are run with the process working directory set to the root
of the source tree (i.e., the directory containing
``pyproject.toml``). To find the source tree, hooks should call
``os.getpwd()`` or equivalent.

Rationale: the process working directory has to be set to something,
and if we were to leave it up to the build frontend to pick, then
packages developers would accidentally write code that assumes a
particular answer here (example: ``long_desc =
open("README.rst").read()``), and this code would break when used with
other build frontends. So it's important that we standardize a value
for all build frontends to use consistently. And this is the obvious
thing to specify it as, especially because it's compatible with
popular and long-standing conventions like calling
``open("README.rst").read()``. Then, given that we've decided to
standardize on working directory = source directory, it makes sense to
say that this is the *only* way that this information is passed,
because providing a second redundant way (example: as an explicit
argument to hooks) would only increase the possiblity of error without
any benefit.


Lifecycle
~~~~~~~~~

XX TODO: do we want to require frontends to use a new process for
every hook call, or do we want to require backends to support multiple
calls from the same process? Apparently scons and setuptools both can
get cranky if you try to invoke them twice from the same process, so
*someone* will be spawning extra processes here; the question is where
to put that responsibility. The basic trade-off is that making it the
backend's responsibility has better best-case performance if both the
frontend and backend are able to re-use a single host process; but, if
common frontends end up using new processes for each hook call for
other reasons, then in practice either backends will end up spawning
unnecessary extra processes, or else will end up with poorly tested
paths when multiple hooks are run in the same process.

Given that ``get_build_*_requires`` → ``build_*`` in general requires
changing the Python environment, it doesn't necessarily make sense to
run these in the same process anyway. However, there's an important
special case where it does: when ``get_build_*_requires`` returns
``[]``. And this is probably the overwhelmingly most common case.

Does it even matter? Windows is notoriously slow at spawning
subprocesses. As a quick test, I tried measuring the time to spawn
CPython 3.6 + import a package on a Windows 10 VM running on my
laptop. ``python3.6 -c "import flit"`` was about 300 ms per call;
``python3.6 -c "import setuptools"`` was about 600 ms per call.

We could also potentially get fancy and have a flag to let the
frontend and backend negotiate this (e.g. ``process_reuse_safe`` as an
opt-in flag). This could also be added later as an extension, as long
as we initially default to requiring separate processes for each hook.


Calling conventions
~~~~~~~~~~~~~~~~~~~

Hooks MAY be called with positional or keyword arguments, so backends
implementing them MUST be careful to make sure that their signatures –
including argument names – exactly match those specified here.


Output
~~~~~~

Hooks MAY print arbitrary informational text on stdout and
stderr. They MUST NOT read from stdin, and the build frontend MAY
close stdin before invoking the hooks.

The build frontend may capture stdout and/or stderr from the backend. If the
backend detects that an output stream is not a terminal/console (e.g.
``not sys.stdout.isatty()``), it SHOULD ensure that any output it writes to that
stream is UTF-8 encoded. The build frontend MUST NOT fail if captured output is
not valid UTF-8, but it MAY not preserve all the information in that case (e.g.
it may decode using the *replace* error handler in Python). If the output stream
is a terminal, the build backend is responsible for presenting its output
accurately, as for any program running in a terminal.

If a hook raises any exception, or causes the process to terminate,
then this indicates that the operation has failed.


User-specified configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All hooks take a standard ``config_settings`` argument.

This argument is an arbitrary dictionary provided as an "escape hatch"
for users to pass ad-hoc configuration into individual package
builds. Build backends MAY assign any semantics they like to this
dictionary. Build frontends SHOULD provide some mechanism for users to
specify arbitrary string-key/string-value pairs to be placed in this
dictionary. For example, they might support some syntax like
``--package-config CC=gcc``. Build frontends MAY also provide
arbitrary other mechanisms for users to place entries in this
dictionary. For example, ``pip`` might choose to map a mix of modern
and legacy command line arguments like::

  pip install                                           \
    --package-config CC=gcc                             \
    --global-option="--some-global-option"              \
    --build-option="--build-option1"                    \
    --build-option="--build-option2"

into a ``config_settings`` dictionary like::

  {
   "CC": "gcc",
   "--global-option": ["--some-global-option"],
   "--build-option": ["--build-option1", "--build-option2"],
  }

Of course, it's up to users to make sure that they pass options which
make sense for the particular build backend and package that they are
building.


Hook execution environment
~~~~~~~~~~~~~~~~~~~~~~~~~~

One of the responsibilities of a build frontend is to set up the
Python environment in which the build backend will run.

We do not require that any particular "virtual environment" mechanism
be used; a build frontend might use virtualenv, or venv, or no special
mechanism at all. But whatever mechanism is used MUST meet the
following criteria:

- All requirements specified by the project's build-requirements must
  be available for import from Python. In particular, the
  distributions specified in the ``pyproject.toml`` key
  ``build-system.requires`` must be made available to all hooks. Some
  hooks have additional requirements documented below.

- This must remain true even for new Python subprocesses spawned by
  the build environment, e.g. code like::

    import sys, subprocess
    subprocess.check_call([sys.executable, ...])

  must spawn a Python process which has access to all the project's
  build-requirements. For example, this is necessary to support build
  backends that want to run legacy ``setup.py`` scripts in a
  subprocess.

- All command-line scripts provided by the build-required packages
  must be present in the build environment's PATH. For example, if a
  project declares a build-requirement on `flit
  <https://flit.readthedocs.org/en/latest/>`__, then the following must
  work as a mechanism for running the flit command-line tool::

    import subprocess
    subprocess.check_call(["flit", ...])

A build backend MUST be prepared to function in any environment which
meets the above criteria. In particular, it MUST NOT assume that it
has access to any packages except those that are present in the
stdlib, or that are explicitly declared as build-requirements.


Building an sdist
-----------------

Building an sdist involves three phases:

1. The frontend calls the backend's ``get_requires_for_build_sdist`` hook
   to query for any extra requirements that are needed for the sdist
   build.

2. The frontend obtains those requirements. For example, it might
   download them from PyPI and install them into some kind of virtual
   environment.

3. The frontend calls the backend's ``build_sdist`` hook to create the
   sdist.

If either hook is missing, or returns the built-in constant
``NotImplemented``. (Note that this is the object ``NotImplemented``,
*not* the string ``"NotImplemented"``), then this indicates that this
backend does not support building an sdist from this source tree. For
example, some build backends might only support building sdists from a
VCS checkout, and not from an unpacked sdist. If this occurs then the
frontend should respond in whatever way it feels is appropriate. For
example, it might display an error to the user.


get_requires_for_build_sdist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

  def get_requires_for_build_sdist(config_settings):
    ...

Computes any additional requirements needed for ``build_sdist``.

Returns: a list of strings containing PEP 508 dependency
specifications, or ``NotImplemented``.

Execution environment: everything specified by the
``build-system.requires`` key in ``pyproject.toml``.

Example::

  def get_requires_for_build_sdist(config_settings):
      return ["cython"]

Or if there are no additional requirements beyond those specified in
``pyproject.toml``::

  def get_requires_for_build_sdist(config_settings):
      return []


build_sdist
~~~~~~~~~~~

::

  def build_sdist(sdist_directory, config_settings):
      ...

Builds a ``.tar.gz`` source distribution and places it in the
specified ``sdist_directory``.

Returns: The basename (not the full path) of the new ``.tar.gz`` file,
as a unicode string, or ``NotImplemented``.

Execution environment: everything specified by the
``build-system.requires`` key in ``pyproject.toml`` and by the return
value of ``get_requires_for_build_sdist``.

Notes:

A .tar.gz source distribution (sdist) is named like
``{name}-{version}.tar.gz`` (for example: ``foo-1.0.tar.gz``), and
contains a single top-level directory called ``{name}-{version}`` (for
example: ``foo-1.0``), which contains the source files of the
package. This directory must also contain the ``pyproject.toml`` from
the build directory, and a PKG-INFO file containing metadata in the
format described in `PEP 345
<https://www.python.org/dev/peps/pep-0345/>`_. Although historically
zip files have also been used as sdists, this hook should produce a
gzipped tarball. This is already the more common format for sdists,
and having a consistent format makes for simpler tooling, so build
backends MUST generate ``.tar.gz`` sdists.

The generated tarball should use the modern POSIX.1-2001 pax tar format, which
specifies UTF-8 based file names. This is not yet the default for the tarfile
module shipped with Python 3.6, so backends using the tarfile module need to
explicitly pass ``format=tarfile.PAX_FORMAT``.


Building a wheel
----------------

The interface for building a wheel is exactly analogous to that for
building an sdist: the same three phases, the same interpretation of
``NotImplemented``, etc., except of course that at the end it produces
a wheel instead of an sdist.


get_requires_for_build_wheel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

  def get_requires_for_build_wheel(config_settings):
      ...

Computes any additional requirements needed for ``build_wheel``.

Returns: a list of strings containing PEP 508 dependency
specifications, or ``NotImplemented``.

Execution environment: everything specified by the
``build-system.requires`` key in ``pyproject.toml``.

Example::

  def get_requires_for_build_wheel(config_settings):
      return ["wheel >= 0.25", "setuptools"]


build_wheel
~~~~~~~~~~~

::

  def build_wheel(wheel_directory, config_settings):
      ...

Builds a ``.whl`` binary distribution, and places it in the specified
``wheel_directory``.

Returns: the basename (not the full path) of the new ``.whl``, as a
unicode string, or ``NotImplemented``.

Execution environment: everything specified by the
``build-system.requires`` key in ``pyproject.toml`` and by the return
value of ``get_requires_for_build_wheel``.

Note: If you unpack an sdist named ``{name}-{version}.tar.gz``, and
then build a wheel from it, then the resulting wheel MUST be named
``{name}-{version}-{compat-info}.whl``.


Extensions
----------

Particular frontends and backends MAY coordinate to define additional
hooks beyond those described here, but they MUST NOT claim top-level
attributes on the build backend object to do so; these attributes are
reserved for future PEPs. Backends MAY provide a ``extensions`` dict,
and the semantics of the object at ``BACKEND.extensions["XX"]`` can be
defined by the project that owns the name ``XX`` on PyPI. For example,
the pip project could choose to define extension hooks like::

  BACKEND.extensions["pip"].get_wheel_metadata

or::

  BACKEND.extensions["pip"]["prepare_build_files"]


=====================================================
 Recommendations for build frontends (non-normative)
=====================================================

A build frontend MAY use any mechanism for setting up a build
environment that meets the above criteria. For example, simply
installing all build-requirements into the global environment would be
sufficient to build any compliant package -- but this would be
sub-optimal for a number of reasons. This section contains
non-normative advice to frontend implementors.

A build frontend SHOULD, by default, create an isolated environment
for each build, containing only the standard library and any
explicitly requested build-dependencies. This has two benefits:

- It allows for a single installation run to build multiple packages
  that have contradictory build-requirements. E.g. if package1
  build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2,
  then these cannot both be installed simultaneously into the global
  environment -- which is a problem when the user requests ``pip
  install package1 package2``. Or if the user already has pbr==1.8.1
  installed in their global environment, and a package build-requires
  pbr==1.7.2, then downgrading the user's version would be rather
  rude.

- It acts as a kind of public health measure to maximize the number of
  packages that actually do declare accurate build-dependencies. We
  can write all the strongly worded admonitions to package authors we
  want, but if build frontends don't enforce isolation by default,
  then we'll inevitably end up with lots of packages on PyPI that
  build fine on the original author's machine and nowhere else, which
  is a headache that no-one needs.

However, there will also be situations where build-requirements are
problematic in various ways. For example, a package author might
accidentally leave off some crucial requirement despite our best
efforts; or, a package might declare a build-requirement on ``foo >=
1.0`` which worked great when 1.0 was the latest version, but now 1.1
is out and it has a showstopper bug; or, the user might decide to
build a package against numpy==1.7 -- overriding the package's
preferred numpy==1.8 -- to guarantee that the resulting build will be
compatible at the C ABI level with an older version of numpy (even if
this means the resulting build is unsupported upstream). Therefore,
build frontends SHOULD provide some mechanism for users to override
the above defaults. For example, a build frontend could have a
``--build-with-system-site-packages`` option that causes the
``--system-site-packages`` option to be passed to
virtualenv-or-equivalent when creating build environments, or a
``--build-requirements-override=my-requirements.txt`` option that
overrides the project's normal build-requirements.

The general principle here is that we want to enforce hygiene on
package *authors*, while still allowing *end-users* to open up the
hood and apply duct tape when necessary.


===================================
 Comparison to competing proposals
===================================

The primary difference between this and competing proposals (in
particular, PEP 516) is
that our build backend is defined via a Python hook-based interface
rather than a command-line based interface.

We do *not* expect that this will, by itself, intrinsically reduce the
complexity calling into the backend, because build frontends will
in any case want to run hooks inside a child -- this is important to
isolate the build frontend itself from the backend code and to better
control the build backends execution environment. So under both
proposals, there will need to be some code in ``pip`` to spawn a
subprocess and talk to some kind of command-line/IPC interface, and
there will need to be some code in the subprocess that knows how to
parse these command line arguments and call the actual build backend
implementation. So this diagram applies to all proposals equally::

  +-----------+          +---------------+           +----------------+
  | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
  |   (pip)   |          |   interface   |           | implementation |
  +-----------+          +---------------+           +----------------+



The key difference between the two approaches is how these interface
boundaries map onto project structure::

  .-= This PEP =-.

  +-----------+          +---------------+    |      +----------------+
  | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
  |   (pip)   |          |   interface   |    |      | implementation |
  +-----------+          +---------------+    |      +----------------+
                                              |
  |______________________________________|    |
     Owned by pip, updated in lockstep        |
                                              |
                                              |
                                   PEP-defined interface boundary
                                 Changes here require distutils-sig


  .-= Alternative =-.

  +-----------+    |     +---------------+           +----------------+
  | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
  |   (pip)   |    |     |   interface   |           | implementation |
  +-----------+    |     +---------------+           +----------------+
                   |
                   |     |____________________________________________|
                   |      Owned by build backend, updated in lockstep
                   |
      PEP-defined interface boundary
    Changes here require distutils-sig


By moving the PEP-defined interface boundary into Python code, we gain
three key advantages.

**First**, because there will likely be only a small number of build
frontends (``pip``, and... maybe a few others?), while there will
likely be a long tail of custom build backends (since these are chosen
separately by each package to match their particular build
requirements), the actual diagrams probably look more like::

  .-= This PEP =-.

  +-----------+          +---------------+           +----------------+
  | frontend  | -spawn-> | child cmdline | -Python+> |    backend     |
  |   (pip)   |          |   interface   |        |  | implementation |
  +-----------+          +---------------+        |  +----------------+
                                                  |
                                                  |  +----------------+
                                                  +> |    backend     |
                                                  |  | implementation |
                                                  |  +----------------+
                                                  :
                                                  :

  .-= Alternative =-.

  +-----------+          +---------------+           +----------------+
  | frontend  | -spawn+> | child cmdline | -Python-> |    backend     |
  |   (pip)   |       |  |   interface   |           | implementation |
  +-----------+       |  +---------------+           +----------------+
                      |
                      |  +---------------+           +----------------+
                      +> | child cmdline | -Python-> |    backend     |
                      |  |   interface   |           | implementation |
                      |  +---------------+           +----------------+
                      :
                      :

That is, this PEP leads to less total code in the overall
ecosystem. And in particular, it reduces the barrier to entry of
making a new build system. For example, this is a complete, working
build backend::

    # mypackage_custom_build_backend.py
    import os.path
    import pathlib

    def get_requires_for_build_wheel(config_settings):
        return ["wheel"]

    def build_wheel(wheel_directory, config_settings):
        from wheel.archive import archive_wheelfile
        filename = "mypackage-0.1-py2.py3-none-any"
        path = os.path.join(wheel_directory, filename)
        archive_wheelfile(path, "src/")
        return filename

    def _exclude_hidden_and_special_files(archive_entry):
        """Tarfile filter to exclude hidden and special files from the
archive"""
        if entry.isfile() or entry.isdir():
            if not os.path.basename(archive_entry.name).startswith("."):
                return archive_entry
        return None

    def get_requires_for_build_sdist(config_settings):
        return []

    def build_sdist(sdist_dir, config_settings):
        sdist_subdir = "mypackage-0.1"
        sdist_path = pathlib.Path(sdist_dir) / (sdist_subdir + ".tar.gz")
        sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
        # Tar up the whole directory, minus hidden and special files
        sdist.add(os.getcwd(), arcname=sdist_subdir,
                  filter=_exclude_hidden_and_special_files)
        return sdist_subdir + ".tar.gz"

Of course, this is a *terrible* build backend: it requires the user to
have manually set up the wheel metadata in
``src/mypackage-0.1.dist-info/``; when the version number changes it
must be manually updated in multiple places... but it works, and more features
could be added incrementally. Much experience suggests that large successful
projects often originate as quick hacks (e.g., Linux -- "just a hobby,
won't be big and professional"; `IPython/Jupyter
<https://en.wikipedia.org/wiki/IPython#Grants_and_awards>`_ -- `a grad
student's ``$PYTHONSTARTUP`` file
<http://blog.fperez.org/2012/01/ipython-notebook-historical.html>`_),
so if our goal is to encourage the growth of a vibrant ecosystem of
good build tools, it's important to minimize the barrier to entry.


**Second**, because Python provides a simpler yet richer structure for
describing interfaces, we remove unnecessary complexity from the
specification -- and specifications are the worst place for
complexity, because changing specifications requires painful
consensus-building across many stakeholders. In the command-line
interface approach, we have to come up with ad hoc ways to map
multiple different kinds of inputs into a single linear command line
(e.g. how do we avoid collisions between user-specified configuration
arguments and PEP-defined arguments? how do we specify optional
arguments? when working with a Python interface these questions have
simple, obvious answers). When spawning and managing subprocesses,
there are many fiddly details that must be gotten right, subtle
cross-platform differences, and some of the most obvious approaches --
e.g., using stdout to return data for the ``build_requires`` operation
-- can create unexpected pitfalls (e.g., what happens when computing
the build requirements requires spawning some child processes, and
these children occasionally print an error message to stdout?
obviously a careful build backend author can avoid this problem, but
the most obvious way of defining a Python interface removes this
possibility entirely, because the hook return value is clearly
demarcated).

In general, the need to isolate build backends into their own process
means that we can't remove IPC complexity entirely -- but by placing
both sides of the IPC channel under the control of a single project,
we make it much cheaper to fix bugs in the IPC interface than if
fixing bugs requires coordinated agreement and coordinated changes
across the ecosystem.

**Third**, and most crucially, the Python hook approach gives us much
more powerful options for evolving this specification in the future.

For concreteness, imagine that next year we add a new ``build_wheel2``
hook, which replaces the current ``build_wheel2`` hook with something
that adds new features (for example, the ability to build multiple
wheels from the same source tree). In order to manage the transition,
we want it to be possible for build frontends to transparently use
``build_wheel2`` when available and fall back onto ``build_wheel``
otherwise; and we want it to be possible for build backends to define
both methods, for compatibility with both old and new build frontends.

Furthermore, our mechanism should also fulfill two more goals: (a) If
new versions of e.g. ``pip`` and ``flit`` are both updated to support
the new interface, then this should be sufficient for it to be used;
in particular, it should *not* be necessary for every project that
*uses* ``flit`` to update its individual ``pyproject.toml`` file. (b)
We do not want to have to spawn extra processes just to perform this
negotiation, because process spawns can easily become a bottleneck when
deploying large multi-package stacks on some platforms (Windows).

In the interface described here, all of these goals are easy to
achieve. Because ``pip`` controls the code that runs inside the child
process, it can easily write it to do something like::

    command, backend, args = parse_command_line_args(...)
    if command == "build_wheel":
       if hasattr(backend, "build_wheel2"):
           backend.build_wheel2(...)
       elif hasattr(backend, "build_wheel"):
           backend.build_wheel(...)
       else:
           # error handling

In the alternative where the public interface boundary is placed at
the subprocess call, this is not possible -- either we need to spawn
an extra process just to query what interfaces are supported (as was
included in an earlier draft of PEP 516, an alternative to this), or
else we give up on autonegotiation entirely (as in the current version
of that PEP), meaning that any changes in the interface will require
N individual packages to update their ``pyproject.toml`` files before
any change can go live, and that any changes will necessarily be
restricted to new releases.


====================
 Evolutionary notes
====================

A goal here is to make it as simple as possible to convert old-style
sdists to new-style sdists. (E.g., this is one motivation for
supporting dynamic build requirements.) The ideal would be that there
would be a single static ``pyproject.toml`` that could be dropped into any
"version 0" VCS checkout to convert it to the new shiny. This is
probably not 100% possible, but we can get close, and it's important
to keep track of how close we are... hence this section.

A rough plan would be: Create a build system package
(``setuptools_pypackage`` or whatever) that knows how to speak
whatever hook language we come up with, and convert them into calls to
``setup.py``. This will probably require some sort of hooking or
monkeypatching to setuptools to provide a way to extract the
``setup_requires=`` argument when needed, and to provide a new version
of the sdist command that generates the new-style format. This all
seems doable and sufficient for a large proportion of packages (though
obviously we'll want to prototype such a system before we finalize
anything here). (Alternatively, these changes could be made to
setuptools itself rather than going into a separate package.)

But there remain two obstacles that mean we probably won't be able to
automatically upgrade packages to the new format:

1) There currently exist packages which insist on particular packages
   being available in their environment before setup.py is
   executed. This means that if we decide to execute build scripts in
   an isolated virtualenv-like environment, then projects will need to
   check whether they do this, and if so then when upgrading to the
   new system they will have to start explicitly declaring these
   dependencies (either via ``setup_requires=`` or via static
   declaration in ``pyproject.toml``).

2) There currently exist packages which do not declare consistent
   metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different
   ``install_requires=``). When upgrading to the new system, projects
   will have to evaluate whether this applies to them, and if so they
   will need to stop doing that.


================================
 Rejected and deferred features
================================

A number of potential extra features were discussed beyond the
above. For the most part the decision was made that it was better to
defer trying to implement these until we had more experience with the
basic interface, and to provide a minimal extension interface (the
``extensions`` dictionary) that will allow us to prototype these
features before standardizing them. Specifically:

* Editable installs: This PEP originally specified another hook,
  ``install_editable``, to do an editable install (as with ``pip
  install -e``). It was removed due to the complexity of the topic,
  but may be specified in a later PEP.

  Briefly, the questions to be answered include: what reasonable ways existing
  of implementing an 'editable install'? Should the backend or the frontend
  pick how to make an editable install? And if the frontend does, what does it
  need from the backend to do so.

* Getting wheel metadata from a source tree without building a wheel:
  it's believed that when pip adds a backtracking constraint solver
  for package dependencies, it may be useful to add a hook to query a
  source tree to get metadata about the wheel that it *would*
  generate, if it were asked to build a wheel. Specifically, the kind of
  situation where it's anticipated that this might come up is:

  1. Package A depends on B and C==1.0
  2. B is only available as an sdist
  3. We fetch the sdist for the latest version of B, build it into a
     wheel, and then discover that it depends on C==1.5, which means
     that it isn't compatible with this version of A.
  4. We fetch the sdist for the latest-but-one version of B, build it
     into a wheel, and then discover that it depends on C==1.4, which
     means that it isn't compatible with this version of A.
  5. We fetch the sdist for the latest-but-two version of B...

  The idea would be that we could reduce (but not eliminate) the cost
  of steps 3, 4, 5, ... if there were a way to query a build backend
  to find out the requirements without actually building a wheel,
  which is a potentially expensive operation.

  Of course, these repeated fetches are expensive no matter what we
  do, so the ideal solution would be to provide wheels for B, so that
  none of this needs to be done at all. And for many packages (for
  example, pure Python packages), building a wheel is nearly as cheap
  as fetching the metadata. And building a wheel also has the
  advantage of giving us something we can store in the wheel cache for
  next time. But perhaps this is still a good idea for packages that
  are particularly slow to build (for example, complex packages like
  scipy or qt).

  It was eventually decided to defer this for now, since it adds
  non-trivial complexity for build backends (the metadata fetching
  phase and the wheel building phase run at different times, yet have
  to produce consistent results), and until pip's backtracking
  resolver is actually implemented, we're only guessing at the value
  of this optimization and the exact semantics it will require.

* A specialized hook for copying a source tree into a new source tree:
  in certain cases, like when installing directly from a local VCS
  checkout, pip prefers to copy the source tree to a temporary
  directory before building it. This provides some protection against
  build systems that can give incorrect results when repeatedly
  building in the same tree. Historically, pip has accomplished this
  copy using a simple ``shutil.copytree``, but this causes various
  problems, like copying large git checkouts or intermediate artifacts
  from previous in-place builds. In the future, therefore, pip might
  move to a multi-step process like:

  1. Create an sdist from the VCS checkout
  2. Unpack this sdist into a temporary directory.
  3. Build a wheel from the unpacked sdist.
  4. Install the wheel.

  Even better, this provides some guarantee that going from VCS
  checkout → sdist → wheel will produce identical results to going
  directly from VCS checkout → wheel.

  However, this runs into a potential problem: what if this particular
  combination of source tree + build backend can't actually build an
  sdist? (For example, `flit <http://flit.readthedocs.org/>`__ may
  have this limitation for certain trees unpacked from sdists.)
  Therefore, we considered adding an optional hook like
  ``prepare_temporary_tree_for_build_wheel`` that would copy the
  required source files into a specified temporary directory.

  But:

  * Such a hook would add non-trivial complexity to this spec: it
    requires us to promote the idea of an "out of tree build" to a
    first class concept, and specify which kinds of trees are required
    to support which operations, etc.
  * A major motivation for doing the build-sdist-unpack-sdist dance in the
    first place is that we don't trust the backend code to produce the
    same result when building from a VCS checkout as when building
    from an sdist, but if we don't trust the backend then it seems odd
    to add a special hook that puts the backend in charge of doing the
    dance.
  * If sdist creation is unsupported, then pip can fall back on a
    ``shutil.copytree`` strategy in just a few lines of code
  * And in fact, for the one known case where this might be a problem
    (unpacked sdist using flit), ``shutil.copytree`` is essentially
    optimal
  * Though in fact for flit, this is still a pointless expense – doing
    an in-place build is perfectly safe and even more efficient.
  * Plus projects using flit always have wheels, so this will
    essentially never even come up in the first place
  * And pip hasn't even implemented the sdist optimization for legacy
    ``setup.py``\-based projects yet, so we have no operational
    experience to refer to and it might turn out there are some
    unknown-unknowns that we'll want to take into account before
    standardizing an optimization for it here.

  And since this would be an optional hook anyway, it's just as easy
  to add later once the exact parameters are better understood.

* There was some discussion of extending these hooks to allow a single
  source tree to produce multiple wheels. But this is a complex enough
  topic that it clearly requires its own PEP.

* We also discussed making the wheel and sdist hooks build unpacked
  directories containing the same contents as their respective
  archives. In some cases this could avoid the need to pack and unpack
  an archive, but this seems like premature optimisation. It's
  advantageous for tools to work with archives as the canonical
  interchange formats (especially for wheels, where the archive format
  is already standardised). Close control of archive creation is
  important for reproducible builds. And it's not clear that tasks
  requiring an unpacked distribution will be more common than those
  requiring an archive.


===========
 Copyright
===========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


-n

-- 
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

[Distutils] A possible refactor/streamlining of PEP 517

Reply via email to