I've added the proposal to the wiki to keep collecting comments and updates:
http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal On 28.02.2013 12:55, M.-A. Lemburg wrote: > On 28.02.2013 12:45, Donald Stufft wrote: >> On Thursday, February 28, 2013 at 5:55 AM, M.-A. Lemburg wrote: >>> I think we all agree that scanning arbitrary HTML pages >>> for download links is not a good idea and we need to >>> transition away from this towards a more reliable system. >>> >>> Here's an approach that would work to start the transition >>> while not breaking old tools (sketching here to describe the >>> basic idea): >>> >>> Limiting scans to download_url >>> ------------------------------ >>> >>> Installers and similar tools preferably no longer scan the all >>> links on the /simple/ index, but instead only look at >>> the download links (which can be defined in the package >>> meta data) for packages that don't host files on PyPI. >>> >>> Going only one level deep >>> ------------------------- >>> >>> If the download links point to a meta-file named >>> "<packagename>-<version>-downloads.html#<sha256-hashvalue>", >>> the installers download that file, check whether the >>> hash value matches and if it does, scan the file in >>> the same way they would parse the /simple/ index page of >>> the package - think of the downloads.html file as a symlink >>> to extend the search to an external location, but in a >>> predefined and safe way. >>> >>> Comments >>> -------- >>> >>> * The creation of the downloads.html file is left to the >>> package owner (we could have a tool to easily create it). >>> >>> * Since the file would use the same format as the PyPI >>> /simple/ index directory listing, installers would be >>> able to verify the embedded hash values (and later >>> GPG signatures) just as they do for files hosted directly >>> on PyPI. >>> >>> * The URL of the downloads.html file, together with the >>> hash fragment, would be placed into the setup.py >>> download_url variable. This is supported by all recent >>> and not so recent Python versions. >>> >>> * No changes to older Python versions of distutils are >>> necessary to make this work, since the download_url >>> field is a free form field. >>> >>> * No changes to existing distutils meta data formats are >>> necessary, since the download_url field has always >>> been meant for download URLs. >>> >>> * Installers would not need to learn about a new meta >>> data format, because they already know how to parse >>> PyPI style index listings. >>> >>> * Installers would prefer the above approach for downloads, >>> and warn users if they have to revert back to the old >>> method of scanning all links. >>> >>> * Installers could impose extra security requirements, >>> such as only following HTTPS links and verifying >>> all certificates. >>> >>> * In a later phase of the transition we could have >>> PyPI cache the referenced distribution files locally >>> to improve reliability. This would turn the push >>> strategy for uploading files to PyPI into a pull >>> strategy for those packages and make things a lot >>> easier to handle for package maintainers. >>> >> I don't have time to respond to the rest right now, but this isn't doable >> I don't think. The purpose of that legalese you pointed out is to make >> it possible for PyPI to serve those files legally. We don't know if those >> files are something PyPI is allowed to distribute so PyPI can't cache them. > > Thanks for the note. > > The legalese could be adapted to make this work (if needed) > or we could add a flag to the download.html file which makes > the choice explicit on a per package basis - the latter might > be the better option to address packages that are subject to > export control or other restrictions. > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 28 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig