On Mon, Aug 15, 2016 at 2:09 PM, Donald Stufft <don...@stufft.io> wrote: > Hello! > > I'd like to restrict what folks can upload to PyPI in an effort to help narrow > the scope down and to enable more a more consistent experience for everyone. > > First off, we currently allow people to upload sdist, bdist_wheel, bdist_egg, > bdist_dmg, bdist_dumb, bdist_msi, bdist_rpm, and bdist_wininst. However I > think > that we should try to get rid of support for most of these. Just for reference > currently the number of files uploaded for each type of file looks like: > > * sdist: 506,585 > * bdist_wheel: 81,207 > * bdist_egg: 48,282 > * bdist_wininst: 14,002 > * bdist_dumb: 5,502 > * bdist_msi: 497 > * bdist_rpm: 464 > * bdist_dmg: 45 > > Out of all of these, I think that we can easily remove bdist_dmg, bdist_rpm, > and bdist_dumb. I also believe that there is a strong case for removing > bdist_msi and bdist_wininst. I also think we should consider removing > bdist_egg. > > First of all, when I say "remove", I mean disallow new uploads, but do not > delete the existing files. > > Looking at each file type: > > I think that bdist_dumb is a pretty easy one to remove. It's format is such > that it's basically not a very useful format to begin with. It takes the full > path to the files and stores them in the repository. So If I install something > to /opt/mycoolproject/lib/python3.5/site-packages/froblib/ then it will have > paths that look like opt/mycoolproject/lib/python3.5/site-packages/froblib/... > I think this is obviously not very useful and not many people have uploaded > any > bdist_dumb files at all. They are also problematic because they have the same > file extension as sdists, so pip doesn't really have a great way to > differentiate between bdist_dumbs and sdists except by trying to guess if it > contains one of distutils's adhoc platform tags. > > Looking at bdist_rpm, practically nobody has ever used it with a total of 45 > files ever uploaded for it. It's not super useful to be able to upload rpms to > PyPI since it's not an RPM repository so people have to manually download them > and then install them manually. It's also a bit weird to have support for RPMs > but not for all of the other package formats that people might want. > > Next we have bdist_dmg, bdist_msi, and bdist_winist. I'm lumping these > together > because they're all OS specific installers for OSs that don't already have > some > sort of repository. This lack of a repository format for them means that > random > downloads are already the norm for people using these systems. For these, I > think the usage numbers for bdist_dmg and bdist_msi easily suggest that they > are not very important to continue to support, but it would be weird to > eliminate them without also elminating bdist_wininst. The wininst format has > the most usage out of all of the seldom used formats, however when we look at > the downloads for the last 30 days only 0.42% of the downloads were for > wininst > files, so I think that it's pretty safe to remove them. I think in the past, > these were more important due to the lack of real binary packages on Windows, > but I think in 2016 we have wheel, and Wheel is a better solution. If however > we want to keep them, then I think it's pretty safe to remove them from our > /simple/ pages and any future repository pages and modify our mirroring > tooling > to stop mirroring them. IOW, to treat them as some sort of "additional upload" > rather than release uploads to PyPI. > > Finally, bdist_egg is quite possibly the trickiest one to justify. A fair > number of people still upload eggs, even though we have the wheel format. > However, I think that we should (and generally do) consider eggs to be > deprecated and while we don't want to break existing packages by removing > them, > we should block further uploads for them. Looking again at the download > numbers > eggs represented only 1.8% of total downloads in the last 30 days while wheels > represented 41% and sdists represented 56%. Further more, I think we should do > this with the expectation that any new repository API will no longer include > egg files in them, and will just be sdists and wheels. > > Doing all of this would leave us with: > > * The /simple/ repository only having sdists, wheels, and eggs and disallowing > new eggs to be uploaded. > * The hypothetical repository 2.0 api only having sdists and wheels. > * *MAYBE* Having "related files" that people could upload things like > Windows/OSX Installers. > > On a related but different note, I'd also like to restrict the acceptable > extensions for sdists. Currently distutils allows people to generate .tar, > .tar.gz, .tgz, .tar.bz2, .tbz, .zip, .tar.xz, .tar.Z and possibly others. This > is a bit problematic because each of those types requires a different set of > optional libraries (which may or may not exist depending on Python version) so > you can't really use a lot of them (for example, while .tar.xz might give you > better compression, it's also not usable before Python 3). > > Looking at numbers again, the current number of projects for each file ext > are: > > * .tar.gz: 444,338 > * .zip: 58,774 > * .tar.bz2: 3,265 > * .tgz: 217 > * Everything Else: 0 > > These results are not particularly surprising since .tar.gz is the default > format that distutils creates. What I would like to do is, similarly to above, > simply stop allowing new uploads for anything but .tar.gz. This will have a > few > positive effects: > > * It will make it so that (going forward) there is only a single one of the > optional C libraries that Python can be compiled against that is required > for > someone to install something from PyPI using pip. This would be the zlib > library which is available almost universally. > > * It will reduce confusion because people will not be able to upload two > different source releases for the same version (e.g. example-1.0.tar.gz and > example-1.0.zip). > > * It will making tooling around PyPI easier, because it'll only have to deal > with one file extension for sdists. > > > Thoughts?
My only thought is how we convey this message to users. I wonder if it would be beneficial to have Twine cut a release that warns users when they are uploading something that will be unsupported, then have Warehouse/PyPI start returning a 415 (Unsupported media type) approximately a few weeks/month later. I'm +1 for restricting the kinds of things people can upload though. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig