Re: [Distutils] A possible refactor/streamlining of PEP 517

Donald Stufft Thu, 06 Jul 2017 10:20:01 -0700

> On Jul 6, 2017, at 12:35 PM, Thomas Kluyver <tho...@kluyver.me.uk> wrote:
> 
> Thanks Nick for the detailed reply. I have read it carefully, and you've
> probably convinced me to get back on board. Some more responses inline:
> 
> On Thu, Jul 6, 2017, at 03:38 PM, Nick Coghlan wrote:
>> While I can completely understand how the current debate over whether
>> or not the prepare_input_for_build_wheel hook is necessary or not
>> would make you feel that way, I hope I can convince you that we're
>> really just quibbling over a genuinely trivial arcane technical detail
>> that I'd never let get in the way of flit being a full-fledged
>> participant in the Python packaging ecosystem.
> 
> To be clear, I don't particularly care for the hook. I can see that it's
> something of a kludge between two competing approaches.
> 
> What is important to me is that if a user installs from source the
> obvious way (pip install . ), failure to build an sdist does not result
> in a failure to install. The extra hook was one approach to that, but
> it's also OK by me if it tries to make an sdist and falls back to either
> copytree or an inplace build.


I *think* if we had some way to signal expected failure vs unexpected failure 
this would be reasonable to me. I wouldn’t just want it to flat out be any 
failure, but if we used Nathaniels NotImplemented idea or something similar to 
indicate that “hey, I can’t build an sdist here for expected reasons” compared 
to “Hey I tried to build the sdist, but something went wrong” I think that 
would be workable.

I think it’s most likely in pip we’d implement it as a copytree (at least to 
start, possibly when we have more experience with other build backends that 
could be relaxed to inplace).

> 
>> That is, the current point of contention is specifically about how we
>> want tools to behave when we're starting with a source directory that:
>> 
>> 1. Doesn't include VCS metadata (e.g. it's been exported as a tarball
>> rather than cloned)
>> 2. The build frontend doesn't want to use as the basis for an in-place
>> build
>> 3. The build frontend doesn't want to blindly copy into a separate
>> build directory
>> 
>> So just by way of those preconditions, we're already well outside the
>> most common package installation workflows.
> 
> One of my concerns in this debate is that this is presented as a very
> rare corner case that we don't have to worry about too much. I agree
> that it's not the most common case, but I think it's common enough that
> we should care about making it easy,  given that:
> 
> - Condition 1 also covers directories with VCS metadata where the VCS
> tools are not on $PATH. Another case occurred to me recently: Windows
> users who have installed git but not added it to the default PATH.
> - Conditions 2 and 3 seem likely to be the default for a source install
> with pip.
> 
> As an order of magnitude, I'd estimate this is ~10% of installs from a
> source directory - which is to say, moderately common.

Unfortunately metrics is hard in OSS software, I’d love to have pip have 
metrics so we could bring real numbers to the discussion to try and figure out 
what cases are more common than other cases and by how much. I do know that pip 
downloaded 12 million sdists from PyPI yesterday (and 28 million wheels) but 
how that compares to the number of people doing ``pip install .`` for varying 
states of a tree in ``.`` we really don’t know besides guessing.

> 
>> That perspective is embodied in the hypothetical proposal to add a
>> "--build-strategy" option to pip that would allow folks building
>> wheels to choose between:
>> 
>> - creating and unpacking an sdist and building a wheel from that
>> - copying the directory tree and building a wheel from that
>> - building a wheel directly from the original directory
>> 
>> (Perhaps with a variant that tries to create and unpack the sdist
>> first, and only if that fails falls back to copying the entire tree)
> 
> This could be useful flexibility for advanced users. But I worry that
> pip will use the 'sdist' build strategy by default, and expect users to
> handle cases where that fails. I think this would be a mistake. From a
> user perspective, it would mean:
> 
> - "pip install ." is the recommended way to install from source, but in
> some situations it doesn't work.
> - Adding the mystic incantation "--build-strategy direct" makes it work,
> and from a user perspective makes absolutely no difference to the
> result.
> 
> Of course, I also have a vested interest in things not working this way:
> I would get a steady trickle of people asking "why does flit require a
> VCS to install from source?" From my perspective, it doesn't require
> that, but I would be unable to 'fix' it.

From my perspective, I would prefer not to add a —build-strategy flag [1] to 
pip and would rather have some generic solution that just generally works OR 
raises a clear error. I agree that I suspect for most people this flag would 
just end up being some “make it work” turd they cargo cult around (which likely 
made one scenario work, but broke another scenario). Maybe it’s useful as 
something for advanced users, but that’s more of a pip discussion then a 
discussion for this PEP.

> 
> Donald:
>> I think it is a complete non-starter to suggest removing installation from 
>> sdist support from pip
> 
> I'm certainly not suggesting that (hopefully this was already clear, but
> just in case ;-)

Oh no, I didn’t think you were advocating for that. Rather I was trying to 
explain why I arrive at the “go via sdist” route, because I start at “How can 
we eliminate additional routes a package takes from “VCS” to “Installed”” and 
since I don’t think we can get rid of sdist, then my mind immediately goes to 
“well, can we make everything go through sdist?”.

> 
>> the question then becomes do we want to try and push things towards only 
>> having *one* primary flow through the state machine of Python’s packaging, 
>> or do we want to support transitions that allow you to “skip” steps. 
> 
> My idealised view of the state machine is something like this:
> 
> wheel <-- source tree <--> sdist
> 
> I agree that there's a problem with losing important data when you go
> [source tree --> sdist --> source tree] - in fact this is one of the
> pain points I was trying to avoid with flit. But I don't like the idea
> of solving that by saying that all wheels must have passed through an
> sdist; it feels like a redundant there-and-back-again journey.
> 
> So how else could we tackle the systematic problem? It's definitely a
> good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't
> miss out anything that [stree --> wheel] includes, but I'd focus on
> doing this in developer tools, e.g.:
> 
> 1. Tools such as flit could check it when you're building a release
> 2. Tools running on CI services could build both and compare them
> 3. Bots could scan PyPI for projects with both a .whl and a .tar.gz,
> build a wheel from the tarball, compare them, and notify the maintainer
> if there's a problem.
> 
> In the short term, I reckon that 2 is the most promising - we can make a
> convenient pip-installable tool and promote it as good practice for
> testing that your builds work. But in any case, I see a range of options
> for tackling this while leaving open the direct [stree --> wheel]
> pathway.

Yea, I absolutely don’t think going through sdist is the *only* way to tackle 
the problem.

It’s attractive to me because in my mind it is entirely automatic so doesn’t 
require a hypothetical developer to learn another tool and setup infrastructure 
etc to handle it. The common stumbling block I see people (new and experienced 
alike) is when ``pip install .`` and ``pip install foo-1.0.tar.gz`` result in 
something different. Focusing on the developer side provides tooling that helps 
them detect when they’ve done something that might trigger that, but doesn’t 
actively prevent it.

A similar-ish scenario is I hope to in the future be able to start validating 
the rendering of long_description on PyPI on upload, and rejecting for invalid 
syntax, because while readme_renderer exists and people can use it (and it lets 
them detect problems earlier on) forcing all uploads to PyPI to essentially 
have their long_description checked completely side steps that class of 
problems from reoccurring.

If things don’t go the way I would prefer and we decide that we’re going to 
just deal with the problems that “many paths” creates (because as a collective, 
we liked the tradeoffs better) then I think that (2) is likely to be a good 
“second best” solution in my mind.

> 
>> When I looked at flit it also suffered the same problem if you forgot to 
>> commit a file to the VCS repository (which meant it wouldn’t get added to 
>> the sdist) 
> 
> You have to explicitly ignore a file to hit this. If you have untracked
> but non-ignored files in your repo, flit will refuse to build an sdist
> at all. I recognise that this is quite strict and still doesn't entirely
> prevent the issue, and I may refine it in the future, but I hope it
> makes such problems hard to hit accidentally.

Ah yes, I think I saw that chunk of code but it didn’t fully register what the 
effect of it was going to be. So I’ll still assert that this isn’t a problem 
that is specific to distutils/setuptools but that flit itself does make it 
harder to hit than I originally thought.

[1] I know we have —upgrade-strategy, but that is intended to go away after the 
transition period of switching our default upgrade behavior is over.

—
Donald Stufft

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] A possible refactor/streamlining of PEP 517

Reply via email to