Re: [Distutils] A possible refactor/streamlining of PEP 517

Paul Moore Sat, 15 Jul 2017 03:55:51 -0700

On 15 July 2017 at 10:42, Nathaniel Smith <n...@pobox.com> wrote:
> Hi Paul,
>
> We seem to have some really fundamental miscommunication here;
> probably we should figure out what that is instead of continuing to
> talk past each other.


Agreed. Thanks for summarising your understanding. Let's see if I can
clarify what I'm saying.

> As a wild guess... can you define what an "out-of-place build" means to you?

I'm going to do this without reference to your explanation, as trying
to put things in the context of what you say is where I'm getting
messed up. I'll comment on your explanations below.

Again, I'll start with some background. My concern is where we're
trying to deal with a user doing "pip install ." on their development
directory. This is not a core use case for pip, and honestly I don't
feel it's the right workflow (people wanting to work on code and
install it as they go along should be using editable installs, IMO),
but it is something we see people doing, and I don't want to break
that workflow arbitrarily. Given that this is the case we're talking
about, my experience is that working directories contain all sorts of
clutter - small test files I knocked up, experimental changes I
discarded, etc. That may simply reflect the way I work, but comments
I've seen indicate that I'm not *completely* alone. So for me, the
point here is about making sure that "pip install ." in a cluttered
working directory results in "what the developer wants".

For me the key property I'm looking for is that the developer gets
consistent results for the build commands (i.e., build_wheel and
build_sdist->build a wheel give the same wheel). This is important for
a number of reasons - to avoid publishing errors where the developer
builds a wheel and a sdist and deploys them, to ensure that tox (which
uses sdists) gives the same results as manually running the tests,
etc. In one of your posts you characterised the sorts of discrepancies
I'm trying to avoid as "weird errors" and that's precisely the point -
we get confused users raising issues and I want to avoid that
happening.

So, with that in mind, the distinction between an "in place" and an
"out of place" build is that an in-place build simply trusts that the
developer's directory will deliver consistent results, whereas an
out-of-place build does the build in a separate location that we've
asked the backend to ensure doesn't contain unexpected files. It has
nothing to do with repeated builds (but see below).

> For me, the distinction between an in-place and out-of-place build is,
> ... well, first some background to make sure my terminology is clear:
> build systems typically work by taking a source tree as input and
> executing a series of rules to generate intermediate artifacts and
> eventually the final artifacts. Commonly, as an optimization, they
> have some system for caching these intermediate artifacts, so that
> future builds can go faster (called "incremental builds"). However,
> this optimization is often heuristic-based and therefore introduces a
> risk: if the system re-uses a cached artifact that it should have
> rebuilt, then this can generate a broken build.
>
> There are two popular strategies for storing this cache, and this is
> what "in-place" versus "out-of-place" refers to.
>
> "In-place builds" have a single cache that's stored inside the source
> tree -- often, but not always, intermingled with the source files. So
> a classic 'make'-based build where you end up with .o files next to
> all your .c files is an in-place build, and so is 'python setup.py
> build' putting a bunch of .o files inside the build/ directory.
>
> "Out-of-place builds" instead place the cached artifacts into a
> designated separate directory. The advantage of this is that you can
> potentially work around limitations of the caching strategy by having
> multiple caches and switching between them.
>
> [In traditional build systems the build tree concept is also often
> intermingled with the idea of a "build configuration", like debug
> versus optimized builds and this changes the workflow in various --
> but we don't have those and it's a whole extra set of complexity so
> let's ignore that.]

For me, all of the above comes under the heading of "incremental
builds", and I'm considering that out of scope. Specifically, pip's
current behaviour offers no (documented) means of choosing between
incremental or clean builds, and users who want that level of control
should be building with the backend tools (setuptools) directly, and
only using pip for the install step once a wheel has been built.

If and when we discuss a UI in pip for requesting incremental or clean
builds, we'd look at the implications on the backend hooks at that
point - but I'm not sure that'll ever be something we want to do, as
it seems like that should probably always be a use case that we'd want
users to be working directly with the backend for (but that's just my
opinion).

> Corollaries:
>
> - if you're starting with a pristine source tree, then "in-place" and
> "out-of-place" builds will produce exactly the same results, because
> they're running exactly the same rules. (This is why I'm confused
> about why you seem to be claiming that out-of-place builds will help
> developers avoid bugs that happen with in-place builds... they're
> exactly the same thing!)

Agreed, but I'm concerned about build trees that *aren't* "pristine",
insofar as they are working directories for development. All of your
corollaries depend on the idea that you have a "pristine" build tree,
and that's where our confusion lies, I suspect.

> - if you've done an out-of-place build in a given tree, you can return
> to a pristine source tree by deleting the out-of-place directory and
> making a new one, without having to deal with the build backend. if
> you've done an in-place build in a given tree, then you need something
> like a "make clean" rule. But if you have that, then these are
> identical, which is why I said that it sounded like pip would be just
> as happy with a way to do a clean build. (I'm not saying that the spec
> necessarily needs a way to request a clean build -- this is just
> trying to understand what the space of options actually is.)

Again, agreed but irrelevant, as "pristine" is not the case that concerns me.

> - if you're starting with a pristine source tree, and your goal is to
> end up with a wheel *while keeping the original tree pristine*, then
> some options include: (a) doing a copytree + in-place build on the
> copy, like pip does now, (b) making an sdist and then doing an
> in-place build

Same again

> - if you're not starting with a pristine source tree -- like say the
> user went and did an in-place build here before invoking pip -- then
> you have very few good options. Copytree + in-place build will
> hopefully work, but there's a chance you'll pick up detritus from the
> previous build. Out-of-tree-builds might or might not work -- I've
> given several examples of extant build systems that explicitly
> disclaim any reliability in this case. Sdist + build the sdist is
> probably the most reliable option here, honestly, when it's an option.

Your idea of a "not pristine" tree differs from mine - having done an
in-place build is the most innocuous example of a non-pristine tree as
far as I'm concerned, and the easiest to deal with (make clean).

* Copytree is certain *not* to work, because it copies all the things
that make the tree not pristine.
* Build sdist and unpack is pip's current planned approach, but Thomas
had issues with guaranteeing that building a sdist was guaranteed
possible. We do *not* want to have cases where pip can't build a wheel
even though build_wheel would have worked, which means build sdist and
unpack is a problem.
* Ask the backend to make a "clean" directory would work (the backend
should know what it needs) - that was the prepare_directory hooks. But
that got too complex.
* Tell the backend we want a build that's isolated from the source
directory and trust it to do the right thing is where we've currently
ended up.

Based on the current discussion, however, I now have concerns that either

a) Backend developers might not understand what build_directory is
requesting, or
b) The PEP doesn't define the semantics of build_directory in a way
that delivers the results I'm suggesting here

Having had this discussion, and re-read the current draft of the PEP,
I do in fact think that (b) is the case. That worries me, because I
don't think it's just me that had made that mistake. Nick has just
posted a message saying

> Requesting an out-of-tree wheel build is then just a way for a
> frontend to say to the backend "Hey, please build the wheel *as if*
> you'd exported an sdist and then built that, even if you can't
> actually export an sdist right now".

which is exactly what I'd expected. But the PEP doesn't say that.
Specifically, in the PEP:

> When a build_directory is provided, the backend should not create or modify 
> any files in the source directory (the working directory where the hook is 
> called). If the backend cannot reliably avoid modifying the directory it 
> builds from, it should copy any files it needs to build_directory and perform 
> the build there.

The statement "it should copy any files it needs" is correct (but more
subtle than it looks - it doesn't emphasise that the backend must not
copy files it *doesn't* need, i.e., the developer clutter I'm
concerned about). But the statement about "If the backend cannot
reliably avoid modifying the directory it builds from" is misleading -
the reason has *nothing* to do with whether backends can modify the
source directory, and everything to do with whether backends can
reasonably guarantee that there's nothing that would cause
inconsistencies.

One particularly frustrating aspect of this discussion is that the
worst offender for "wheel and sdist are inconsistent" is the way that
setuptools requires developers to specify build and sdist contents
separately (setup.py vs MANIFEST.in). That duplication is an obvious
source of potential inconsistencies, and precisely why we get most of
the reports we see. Ideally, new backends would not design in such
inconsistency[1], which means it's easy to see such inconsistencies as
"that should never happen" or "I don't understand the problem". But we
will have to deal with the possibility of such backends, and the
setuptools model isn't *that* unusual (setuptools didn't invent the
file MANIFEST.in, it just reused the name for its own purpose).

[1] I don't know enough about flit to be sure, but if the developer
forgets to check in a new source file, would it be possible for that
source file be in the wheel but not in the sdist?

> Does that make sense? Does it... help explain any of the ways we're
> talking past each other?

It does, a lot. Thanks.

Paul
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] A possible refactor/streamlining of PEP 517

Reply via email to