I know exactly what's going on now. When setuptools wants to create the egg-info/SOURCES.txt file, it calls into distutils to get a full manifest of all the files that are going to be installed. For better or worse, what distutils does is:
* walk the filesystem from cwd, getting a list of all the files matching the default pattern (e.g. .py). * apply some default filters (e.g. pruning build/ and $vcs directories) * filter the remaining list based on MANIFEST.in The filtered list is what gets written to SOURCES.txt. You can see how the first step will find all the Debian build artifacts in the .pybuild/ and debian/ directories, and as each Python version-specific build proceeds, all the previous build artifacts will show up in subsequent SOURCES.txt files. The reason this only affects some packages and not others has everything to do with whether and how their upstreams have written their MANIFEST.in files. The semantics of these files is defined here: https://docs.python.org/2/distutils/sourcedist.html#manifest-template If they've done something like include *.py then you'll end up with every file that matches that pattern, including ones in .pybuild/ and debian/. If they've done something like recursive-include tests * then you probably won't (modulo of course other commands in that file). There are some standard files and patterns that always get included or pruned. What are the options for handling this? 1) Require upstreams to explicitly prune .pybuild/ and debian/ I don't like this because we really don't want to impose restrictions on upstreams. They might have a good reason for writing the MANIFEST.in as they have, and they certainly didn't expect additional build artifacts from downstreams to pollute their filesystem. This is a Debian issue so Debian should handle it. 2) Require maintainers to patch their package's MANIFEST.in. I don't like this because it's difficult to notice, diagnose, and understand how to fix. As you'll see, I am proposing adding some diagnostics to dh_python3, but even with that, I think it will be easy to miss. Even if caught, maintainers might not easily know how to fix the problem. We could document that, but it still requires additional quilt patches. The Debian tools are at the root this issue so I think they should handle it. 3) Modify setuptools/distutils to always prune .pybuild/ and debian/ We could do that this, but it means a delta from upstream that we would have to carry forever. Yuck. 4) Build the packages elsewhere. I don't think this is feasible. Maybe .pybuild/ could be put in the build/ directory, which is a distutils/setuptools artifact that's automatically ignored, but we still have the debian/ directory to deal with and I think that's not easily moved. 4) dh_python3 post-process the SOURCES.txt file. I'm not a fan of this because of the extra complexity. Also, see below. 6) Ignore differences in SOURCES.txt. This seems to make the most sense to me. We *already* delete the SOURCES.txt from the final binary packages' egg-info directories, which makes sense, because apt carries its own manifest of installed files. If we're ignoring SOURCES.txt in the final egg-info, why not ignore it for comparison purposes? The only downside could be if the rewheel/dirtbike idea moves forward and would require the egg-info/SOURCES.txt file to work, but since those (still mythical tools) are distro-specific (maybe even Debian-specific), they can use Debian tools to get a list of installed files to operate on. So I advocate #6 and have a branch that implements exactly this. I've verified this fixes the problem for at least two affected packages. You'll notice too that the branch adds some additional diagnostic output in the case where the filecmp fails. It prints a helpful message and a unified diff to stderr so you can at least try to figure out why the bogus python3.X/.../egg-info directory didn't get collapsed. A proposed fix is in the bug801710 branch at git+ssh://git.debian.org/git/dh-python/dh-python.git
pgpE0zdQmMxht.pgp
Description: OpenPGP digital signature

