I know exactly what's going on now.

When setuptools wants to create the egg-info/SOURCES.txt file, it calls into
distutils to get a full manifest of all the files that are going to be
installed.  For better or worse, what distutils does is:

* walk the filesystem from cwd, getting a list of all the files matching the
  default pattern (e.g. .py).
* apply some default filters (e.g. pruning build/ and $vcs directories)
* filter the remaining list based on MANIFEST.in

The filtered list is what gets written to SOURCES.txt.

You can see how the first step will find all the Debian build artifacts in the
.pybuild/ and debian/ directories, and as each Python version-specific build
proceeds, all the previous build artifacts will show up in subsequent
SOURCES.txt files.

The reason this only affects some packages and not others has everything to do
with whether and how their upstreams have written their MANIFEST.in files.
The semantics of these files is defined here:

https://docs.python.org/2/distutils/sourcedist.html#manifest-template

If they've done something like

    include *.py

then you'll end up with every file that matches that pattern, including ones
in .pybuild/ and debian/.

If they've done something like

    recursive-include tests *

then you probably won't (modulo of course other commands in that file).  There
are some standard files and patterns that always get included or pruned.

What are the options for handling this?

1) Require upstreams to explicitly prune .pybuild/ and debian/

I don't like this because we really don't want to impose restrictions on
upstreams.  They might have a good reason for writing the MANIFEST.in as they
have, and they certainly didn't expect additional build artifacts from
downstreams to pollute their filesystem.  This is a Debian issue so Debian
should handle it.

2) Require maintainers to patch their package's MANIFEST.in.

I don't like this because it's difficult to notice, diagnose, and understand
how to fix.  As you'll see, I am proposing adding some diagnostics to
dh_python3, but even with that, I think it will be easy to miss.  Even if
caught, maintainers might not easily know how to fix the problem.  We could
document that, but it still requires additional quilt patches.  The Debian
tools are at the root this issue so I think they should handle it.

3) Modify setuptools/distutils to always prune .pybuild/ and debian/

We could do that this, but it means a delta from upstream that we would have
to carry forever.  Yuck.

4) Build the packages elsewhere.

I don't think this is feasible.  Maybe .pybuild/ could be put in the build/
directory, which is a distutils/setuptools artifact that's automatically
ignored, but we still have the debian/ directory to deal with and I think
that's not easily moved.

4) dh_python3 post-process the SOURCES.txt file.

I'm not a fan of this because of the extra complexity.  Also, see below.

6) Ignore differences in SOURCES.txt.

This seems to make the most sense to me.  We *already* delete the SOURCES.txt
from the final binary packages' egg-info directories, which makes sense,
because apt carries its own manifest of installed files.  If we're ignoring
SOURCES.txt in the final egg-info, why not ignore it for comparison purposes?

The only downside could be if the rewheel/dirtbike idea moves forward and
would require the egg-info/SOURCES.txt file to work, but since those (still
mythical tools) are distro-specific (maybe even Debian-specific), they can use
Debian tools to get a list of installed files to operate on.

So I advocate #6 and have a branch that implements exactly this.  I've
verified this fixes the problem for at least two affected packages.

You'll notice too that the branch adds some additional diagnostic output in
the case where the filecmp fails.  It prints a helpful message and a unified
diff to stderr so you can at least try to figure out why the bogus
python3.X/.../egg-info directory didn't get collapsed.

A proposed fix is in the bug801710 branch at
git+ssh://git.debian.org/git/dh-python/dh-python.git

Attachment: pgpE0zdQmMxht.pgp
Description: OpenPGP digital signature

Reply via email to