Recently, a proposal was made to change the sorting of links on PyPI's /simple index to prevent problems with easy_install finding out-of-date non-PyPI download links. That proposal, unfortunately, would not have solved the actual problem.

After giving it some thought, I have an alternative proposal, that I think *would* solve the problem, and work for all scraping tools using the /simple index, not just easy_install.

Essentially, the problem is that when links to "hidden" versions were added to the /simple index (to satisfy users wanting to be able to download older versions' distributions), in-description and home/download page links were included. However, if a package's home page URL or revision control download links change over time, the older ones still show up in the /simple listing, leading to ambiguity for download tools.

However, since the actual use case for which this was added was only to support reaching specific older versions of a project, it isn't actually necessary to include links that aren't to downloadable files with a specific version number.

Say package Foo releases version 1.1, causing 1.0 to become hidden. People still want to be able to download the 1.0's .tgz's or .rpm's or what-have-you's. However, they do *not* still need to be able to access the project's older, now-defunct home page, or any of the extra links included in the older version's description.

It is these extraneous links that cause the problem, not the access to PyPI-hosted archives.

Now, it could be argued that if a project used its "download" or "home page" link (or even in-description links) to point to actual archives, and if that is the case, then older links would be lost by omitting such links for "hidden" versions. However, if that's really a problem, it could be remedied by simply checking whether the URL contains a file extension, or a revision number, or something like that.

However, since the original request to access hidden versions was aimed squarely at PyPI-hosted downloads, the original use case could still be met simply by only including PyPI-hosted links for "hidden" releases, thereby insuring that other links are only shown for "current" versions -- i.e., ones that package authors would expect are the only versions whose home/download/description links would need to be kept up-to-date on.

Making such a change would immediately fix many problematic/ambiguous links in the /simple index, where out-of-date or no-longer available links are shown. (It would also fix the security issue whereby someone acquiring a no-longer-in-service URL could link it to trojan downloads.)

_______________________________________________
Catalog-SIG mailing list
[email protected]
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to