Re: [Catalog-sig] Deprecate External Links
On Tuesday, March 5, 2013 at 4:01 AM, Donald Stufft wrote: On Thursday, February 28, 2013 at 8:35 AM, Donald Stufft wrote: https://crate.io/externally-hosted/ A list of things that have no files hosted on PyPI but have a release. This doesn't include things that uploads sometimes but not everytime (argparse for example the latest releases have not been uploaded to PyPI). Sorted out a better way of seeing what would be effected by this change. Here is a list of all versions that are currently installable via pip that are not hosted on PyPI (and thus would be uninstallable if all external links would be removed). This filters out projects that never existed or are no longer installable due to issues with the external hosting. I've also included the script I used to generate it. https://gist.github.com/dstufft/5088915 Here's some numbers fetched from that data. 928 projects w/ 2750 total versions have versions not installable directly from PyPI. 721 projects w/ 2543 total versions have versions not installable directly from PyPI if we don't consider the `dev` version. This change would affect 2-3% of the projects on PyPI, and just from scanning down the list it appears some of these appear to merely be a forgotten upload and not a conscious choice to not host their packages on PyPI (for example Django has only 1 version not installable directly from PyPI). ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Fri, Mar 01, 2013 at 10:02 +0100, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Ah ok, thanks. Didn't interpret Marc-Andre's post as claiming that downloads/homepage crawling is a good idea, though. Just that there has been reasons not to upload things which need to be addressed, especially the need for enough storage space. best, holger Reinout -- Reinout van Reeshttp://reinout.vanrees.org/ rein...@vanrees.org http://www.nelen-schuurmans.nl/ If you're not sure what to do, make something. -- Paul Graham ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 01.03.2013 10:02, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say I won't do that, just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. This is the main reason why we're hosting our own PyPI-style indexes, one for UCS2 and the other for UCS4 builds: https://downloads.egenix.com/python/index/ucs2/ https://downloads.egenix.com/python/index/ucs4/ * I'm not sure whether we want to import our crypto packages to the US, so for a subset of the files, we'd probably continue to use our servers in Germany. Again, with the above proposal, this shouldn't be a problem. * Ihe PyPI terms are a bummer for us, but this can be fixed, I guess. If we can resolve the issues, we'd have no problem having the files mirrored on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Fri, Mar 01, 2013 at 10:24 +0100, M.-A. Lemburg wrote: On 01.03.2013 10:02, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say I won't do that, just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. You can actually skip the register and directly upload, it will create release metadata on the fly. Not sure if it's complete but you can then do a register to update it if needed. best, holger The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. This is the main reason why we're hosting our own PyPI-style indexes, one for UCS2 and the other for UCS4 builds: https://downloads.egenix.com/python/index/ucs2/ https://downloads.egenix.com/python/index/ucs4/ * I'm not sure whether we want to import our crypto packages to the US, so for a subset of the files, we'd probably continue to use our servers in Germany. Again, with the above proposal, this shouldn't be a problem. * Ihe PyPI terms are a bummer for us, but this can be fixed, I guess. If we can resolve the issues, we'd have no problem having the files mirrored on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
Marc Andre: I'm cc'ing Van: can you explain why the pypi terms are a bummer so we can see if there is actually an issue to be resolved or a matter of taste? We need to protect the foundation while preserving author rights - but I don't want one user / subset dictating how we evolve the technology. Jesse On Mar 1, 2013, at 4:24 AM, M.-A. Lemburg m...@egenix.com wrote: On 01.03.2013 10:02, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say I won't do that, just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. This is the main reason why we're hosting our own PyPI-style indexes, one for UCS2 and the other for UCS4 builds: https://downloads.egenix.com/python/index/ucs2/ https://downloads.egenix.com/python/index/ucs4/ * I'm not sure whether we want to import our crypto packages to the US, so for a subset of the files, we'd probably continue to use our servers in Germany. Again, with the above proposal, this shouldn't be a problem. * Ihe PyPI terms are a bummer for us, but this can be fixed, I guess. If we can resolve the issues, we'd have no problem having the files mirrored on PyPI. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg m...@egenix.com wrote: On 01.03.2013 10:02, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say I won't do that, just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. Actually, easy_install doesn't pay any attention to what releases are registered. It just looks for primary and secondary links. If there are links for a version that it can use, it uses it. If it does not find links for a version, then that version does not exist, as far as it is concerned. So registering without files is not a problem. The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. That proposal is unnecessary, actually. You could *right now* simply place binary download links (with optional #md5= verification) in your package's description field, and the moment you registered the package, existing tools would find those links and download them from your site. You could then remove your home page and download URLs from the relevant fields, and place them also in the description. (easy_install does not follow non-download links within the description -- i.e., links that don't end in .egg, .tgz, etc. and don't have an #egg tag.) * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. They can if it's part of the platform string; the catch is that right now it's not. We'd have to go through an upgrade cycle of the tools to support that. I need to take a look at what PEP 427 is doing (and you should too, if you haven't already) to get this part sorted out. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 5:01 PM, Donald Stufft donald.stu...@gmail.com wrote: I'm glad the next set of Metadata won't have external links, however even if it showed up tomorrow it's going to be a long time until people are completely migrated to it. Furthermore you estimate months but the first phase will have positive benefits right away, namely that it will prompt people to start uploading their packages better increasing the security and reliability of the current system. And finally while I'm glad to see forward movement It's been said before not to bother making a fix to the existing system because X was going to happen soon, in the past i was distutils2/packaging, now it's PEP426/packaging. While I have every hope and I believe it will happen this time, the past has made me worry about holding off on good incremental improvements to the current infra. Pissing off the maintainers off packages that currently rely on external hosting by telling them they have to change their release processes if they want to keep releasing software on PyPI and have their users actually be able to download it is *not* a good idea, especially when we're about to ask them to upgrade their build chains for other reasons (including both security and reliability). Working on the installation tools and getting them to complain about external links is a *fine* idea. It doesn't break anything, but maintainers will start fielding questions from their users asking Hey, why am I getting this warning when I install your project?. Working on the upload tools and having them *warn* distributors that self-hosting is problematic is also a good idea, as is exploring PJE's suggestions about refining the set of URLs that PyPI currently publishes However, getting PyPI to effectively *break uploads* of projects that rely on external links at this point in time is *not* a good idea: we should NOT mess with people's existing build and upload processes lightly, as any such changes burn up a *lot* of community good will, and that's not something in great supply when it comes to Python's software distribution infrastructure. All current generation infrastructure should continue to work without modification on both the upload side *and* the download side (although, as noted above, it's highly desirable for both the upload side and the download side to be updated to warn users about any reliance on insecure legacy behaviour). I expect a similar rollout in the transition to the next generation metadata format and distribution infrastructure - initially download tools will support both formats, emitting a warning when falling back to the legacy distribution infrastructure, then they will start requiring an option to enable fallback to legacy mode, and eventually there will be released installation tools that don't support the legacy distribution infrastructure at all (such as any default installer included in the standard library). For *next* generation infrastructure, it's our job as system designers to sell it to potential users (primarily everyone writing software which they publish on PyPI, or at least the authors of the toolchains used for that publication, but also to consumers of that software). The distutils2 team failed, in large part because their proposal required radical changes to the way people published Python software. I have deliberately moderated that in the way I have approached PEP 426 - if people can't generate the new metadata with only minor changes to their current processes, it isn't going to fly, and any trade-offs (such as the loss of external hosting support), need to be bought with corresponding benefits (such as guaranteed correct pre-release handling, solid Python version declarations, a clean post-install hook design, and, hopefully, a vastly improved rich metadata publication system for PyPI, probably based on TUF). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 6:12 PM, M.-A. Lemburg m...@egenix.com wrote: On 28.02.2013 07:39, Nick Coghlan wrote: 1. The next generation metadata infrastructure will NOT support external hosting of files indexed on PyPI - if you don't upload the archive files to PyPI, they won't be included in the next generation metadata. If you want external hosting, you will need to run a separate index (this is similar to the yum model - you can host files wherever you want, but you need to run yum createrepo yourself to generate the metadata, and instruct users on how to get their installers to retrieve your metadata. The big difference between PyPI and the yum model is that the default index still won't be curated at all, so there's no review process to get through if you want to use it, thus less need for external hosting). Could you elaborate on this ? AFAIK, the metadata only works on package names, regardless of where an installer finds them. Caveat: this is NOT a final design, and people that aren't me will be working out the exact details. It is, however, how I want it to work. The next generation metadata publication infrastructure is likely to be based on TUF, and thus will consist of pregenerated, signed metadata served as static files. Installers will just download metadata files, sdists and wheels (and probably eggs and tarballs), and never need to contact an active web service. The only active web service technically required will be one to regularly refresh the signed timestamp file that prevents certain kinds of attacks based on providing old, insecure versions of software (a cron job running on the server hosting the metadata would suffice for this task). PyPI itself will have another active service to automatically regenerate the metadata when files are uploaded by maintainers. The delegation of trust within the framework will be defined only for files hosted by PyPI - it will not be extended to allow the declaration of external URLs as a source for the target files. Publishers will still be able to publish on external sites, but they will need to generate their own metadata, and distributions published that way won't be indexed in the next generation metadata on PyPI. This is the same way yum repos work - the metadata for each repo only covers SRPMs and RPMs hosted in that repo. If you want to download software from somewhere else, you have to add another repo definition in the client so it knows where to look for the metadata. APT works in a similar fashion. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 22:04 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, and a deprecation warning to pip and easy_install and after some pre-determined time remove the feature from easy_install and pip. I suggest to *change defaults* rather than to remove the feature for the foreseeable future. Changing defaults is a powerful way to communicate and one that doesn't leave people totally stranded who are far removed from discussions and rationales here. folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI Yes there will be: Everyone mailing them to tell them there software is broken and can't be installed with easy_install and pip. That's going to be very annoying very fast. ;-) I've mailed several maintainers in the last half year of 1K downloaded projects to inquire about status, and not received replies. I wanted to base work on their projects and of course i refrained from doing that because of the lack of replies. To me that means you can have users mailing maintainers or screaming at maintainers or saying bad words about maintainers or projects all you want but that doesn't mean it's going to be fixed. To summarize, having pip/easy_install report red warnings and requiring to pass a --htmlscrape=PROJ1,PROJ2 option or so is a good way to communicate, removing the ability is not, at this point. best, holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 09:48 +1100, Richard Jones wrote: On 28 February 2013 08:31, PJ Eby p...@telecommunity.com wrote: OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Yup, and the down-side of distutils as the tool for talking to PyPI is, of course, the horrendous turn-around time trying to add features or fix bugs. I've advocated us having the upload/register/whatever functionality in a separate tool for a while, but that doesn't seem to have gained any traction. Of course issues around the complexity introduced by setup.py make it much harder. FWIW three days ago i presented at Pycon Russia a unifying cmdline workflow meta tool which configures and invokes setup.py [...]/pip/easy_install commands. I intend to publish it soon and will also send a link once the video becomes available. IOW, i fully agree we need to move away from putting things into setup.py/distutils, start going for PEP426 etc. -- but WITHOUT breaking things for all the packaging upload/installation processes out there. Therefore a meta tool approach to make it easier for people to gradually move away from current practises. cheers, holger In the mean time I think Donald's suggestion for supporting development pre-releases is reasonable: instead of (please replace with easy_install lingo here) `pip install setuptools==setuptools-dev` please `pip install -e http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev` ? Richard ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 06:38 +0100, Andreas Jung wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 +1 for the proposal The complete discussion on this topic is once again absurd and bizarre. We are discussing the issue with externally hosted packages every year and the situation has not improved. Especially people using buildout encounter very regulary issues with external site being down - with the result that we can not install or update our installation. I give a shit at the arguments pulled out every time by package maintainers using PyPI only for listing their packages. I am both annoyed and bothered by these people. I didn't see such positions from package maintainers here. In fact i haven't seen anyone stepping up saying listing packages externally is a great idea. Could you point to those posts? However, I have seen concerns about breaking many people's and companies processes and thus thoughts on how to do a good transition. I guess you don't want to communicate to package-users the way you do above to package maintainers. best, holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 28 February 2013 20:09, holger krekel hol...@merlinux.eu wrote: On Thu, Feb 28, 2013 at 09:48 +1100, Richard Jones wrote: On 28 February 2013 08:31, PJ Eby p...@telecommunity.com wrote: OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Yup, and the down-side of distutils as the tool for talking to PyPI is, of course, the horrendous turn-around time trying to add features or fix bugs. I've advocated us having the upload/register/whatever functionality in a separate tool for a while, but that doesn't seem to have gained any traction. Of course issues around the complexity introduced by setup.py make it much harder. FWIW three days ago i presented at Pycon Russia a unifying cmdline workflow meta tool which configures and invokes setup.py [...]/pip/easy_install commands. I intend to publish it soon and will also send a link once the video becomes available. IOW, i fully agree we need to move away from putting things into setup.py/distutils, start going for PEP426 etc. -- but WITHOUT breaking things for all the packaging upload/installation processes out there. Therefore a meta tool approach to make it easier for people to gradually move away from current practises. Awesome! For what it's worth I spent some time today trying to dig up some actual stats on the number of packages with only download_url (roughly 10%) and how popular they are (roughly 90% of those packages were looked up in the /simple index in the last day.) I'm still poking at the numbers though. Richard ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
no support for UCS2/UCS4 binary distributions, unsupported distribution file formats (e.g. our prebuilt format), Not sure why PyPI would even care what charset the package files use, but if true thats certainly a bug and we can get that fixed. What file formats do pip/buildout support that PyPI doesn't support for uploads? Basically, this is all about spam/abuse prevention. I don't want people to upload movie files (whether they be pirated movies or porn files, or just home video) to abuse PyPI as a general file hosting service, and I don't see a way to manually redact the content on PyPI. Regards, Martin ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thursday, February 28, 2013 at 5:29 AM, M.-A. Lemburg wrote: On 27.02.2013 19:21, Donald Stufft wrote: On Wednesday, February 27, 2013 at 1:11 PM, M.-A. Lemburg wrote: On 27.02.2013 18:37, Donald Stufft wrote: On Wednesday, February 27, 2013 at 12:10 PM, M.-A. Lemburg wrote: Package installers only need access to the static files in the /simple/ index. Those can be put behind a CDN to increase uptime. PyPI itself doesn't have to be up and running if you just want to download the files (unfortunately, that's not true at the moment, because the /simple/ index is dynamically generated, but that can be changed). See http://wiki.python.org/moin/CloudPyPI for details. I'm aware of that, but that doesn't change the statement. If /simple/ is down you cannot determine the external urls. There is no way to increase uptime by adding more points of failure. Please reread the proposal. The /simple/ index would get hosted on a separate domain which then points to the CDN. It. Does. Not. Matter. You are simply moving the SPOF which is /simple/, if /simple/ is how you discover the CDN and/or external urls then the things it points too can have 100% uptime and if /simple/ is down the entire system is down. We appear to be talking about different things :-) The proposal suggests to put the /simple/ index itself on Amazon S3 and then have CloudFront distribute the files to the end users. The PyPI server would only manage pushing the file to the S3 buckets. PyPI could go down and Amazon would still be serving the files. See the Moving static data to a CDN of http://wiki.python.org/moin/CloudPyPI/Proposal I'm aware of what you're talking about, Amazon doesn't have 100% uptime. Moving that there is good for other reasons but it doesn't magically make adding multiple single points of failures defy the laws of nature. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thursday, February 28, 2013 at 7:56 AM, Reinout van Rees wrote: On 28-02-13 10:43, holger krekel wrote: On Thu, Feb 28, 2013 at 06:38 +0100, Andreas Jung wrote: I give a shit at the arguments pulled out every time by package maintainers using PyPI only for listing their packages. I am both annoyed and bothered by these people. I didn't see such positions from package maintainers here. In fact i haven't seen anyone stepping up saying listing packages externally is a great idea. Could you point to those posts? The position Andreas probably means is projects that *do* advertise themselves on pypi, but don't put their files there. I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). Some arguments might be valid, but these projects *are*, taken as one group, actively breaking pip and buildout regularly. So I agree with Andreas. I don't really care about the arguments pulled out every time. Effectively actively breaking pip and buildout is bad, period. Reinout -- Reinout van Rees http://reinout.vanrees.org/ rein...@vanrees.org http://www.nelen-schuurmans.nl/ If you're not sure what to do, make something. -- Paul Graham ___ Catalog-SIG mailing list Catalog-SIG@python.org (mailto:Catalog-SIG@python.org) http://mail.python.org/mailman/listinfo/catalog-sig https://crate.io/externally-hosted/ A list of things that have no files hosted on PyPI but have a release. This doesn't include things that uploads sometimes but not everytime (argparse for example the latest releases have not been uploaded to PyPI). ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 7:43 AM, Reinout van Rees rein...@vanrees.org wrote: On 27-02-13 16:26, Donald Stufft wrote: 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. A very good practical illustration: my colleague cannot pip install mercurial right now as the mercurial.selenic.com website is down for hours now. All the download links on http://pypi.python.org/simple/Mercurial/ point at things like http://mercurial.selenic.com/release/mercurial-1.5.tar.gz I'm very happy to have a local buildout egg cache, otherwise the mercurial website's failure would bring a couple of my buildouts to a grinding halt. A couple of those project that don't bother to put their packages on pypi can bring your pip or buildout *down* quite often. Reinout I've been promoting a similar workflow with pip wheel (a proposed command present in the wheel fork of pip): pip wheel -w /wheel/directory dependency pip install --no-index --find-links /wheel/directory dependency You wind up with cached builds for every package you are using and its dependencies and only consult the index when you are willing to be surprised. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 10:30 AM, Lennart Regebro rege...@gmail.com wrote: On Thu, Feb 28, 2013 at 10:43 AM, Lennart Regebro rege...@gmail.com wrote: On Thu, Feb 28, 2013 at 9:28 AM, Nick Coghlan ncogh...@gmail.com wrote: Pissing off the maintainers off packages that currently rely on external hosting by telling them they have to change their release processes if they want to keep releasing software on PyPI and have their users actually be able to download it is *not* a good idea, especially when we're about to ask them to upgrade their build chains for other reasons (including both security and reliability). Who are these people by the way? I can answer that question now. I have a list of 2651 emails of people listed as maintainers or authors of software that doesn't have releases on PyPI. This is a very inclusive list, so it's lists *all* maintainers and authors of *all* versions of a package, if that package has no files on PyPI. And there are duplicate people, of course, although the emails are unique. I've suggested before that we start by sending out emails to these people, but I have to admit that the list is *much* longer than I thought, and that we might want to limit it to those who actually have packages that have been accessed during the last X months or so. //Lennart Looking at some of the packages on Donald's link (https://crate.io/externally-hosted/), some of the websites are just plain broken. Those authors should potentially be contacted separately about completely removing their package from PyPI (assuming they've stopped development or no longer make the project available). ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 28, 2013, at 3:43 AM, Nick Coghlan wrote: On Thu, Feb 28, 2013 at 6:12 PM, M.-A. Lemburg m...@egenix.com wrote: On 28.02.2013 07:39, Nick Coghlan wrote: 1. The next generation metadata infrastructure will NOT support external hosting of files indexed on PyPI - if you don't upload the archive files to PyPI, they won't be included in the next generation metadata. If you want external hosting, you will need to run a separate index (this is similar to the yum model - you can host files wherever you want, but you need to run yum createrepo yourself to generate the metadata, and instruct users on how to get their installers to retrieve your metadata. The big difference between PyPI and the yum model is that the default index still won't be curated at all, so there's no review process to get through if you want to use it, thus less need for external hosting). Could you elaborate on this ? AFAIK, the metadata only works on package names, regardless of where an installer finds them. Caveat: this is NOT a final design, and people that aren't me will be working out the exact details. It is, however, how I want it to work. The next generation metadata publication infrastructure is likely to be based on TUF, and thus will consist of pregenerated, signed metadata served as static files. Installers will just download metadata files, sdists and wheels (and probably eggs and tar balls), It sounds like that move will also be a good opportunity to create a reusable PyPI client library that the installer front-ends (easy_install, pip, whatever) could use, to provide some consistent behavior between the tools. I would like to see it *only* work with PyPI to upload, search, and download distributions but not create distributions, not find them anywhere else, and not upload them anywhere else. Doug and never need to contact an active web service. The only active web service technically required will be one to regularly refresh the signed timestamp file that prevents certain kinds of attacks based on providing old, insecure versions of software (a cron job running on the server hosting the metadata would suffice for this task). PyPI itself will have another active service to automatically regenerate the metadata when files are uploaded by maintainers. The delegation of trust within the framework will be defined only for files hosted by PyPI - it will not be extended to allow the declaration of external URLs as a source for the target files. Publishers will still be able to publish on external sites, but they will need to generate their own metadata, and distributions published that way won't be indexed in the next generation metadata on PyPI. This is the same way yum repos work - the metadata for each repo only covers SRPMs and RPMs hosted in that repo. If you want to download software from somewhere else, you have to add another repo definition in the client so it knows where to look for the metadata. APT works in a similar fashion. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 16:30 +0100, Lennart Regebro wrote: On Thu, Feb 28, 2013 at 10:43 AM, Lennart Regebro rege...@gmail.com wrote: On Thu, Feb 28, 2013 at 9:28 AM, Nick Coghlan ncogh...@gmail.com wrote: Pissing off the maintainers off packages that currently rely on external hosting by telling them they have to change their release processes if they want to keep releasing software on PyPI and have their users actually be able to download it is *not* a good idea, especially when we're about to ask them to upgrade their build chains for other reasons (including both security and reliability). Who are these people by the way? I can answer that question now. I have a list of 2651 emails of people listed as maintainers or authors of software that doesn't have releases on PyPI. This is a very inclusive list, so it's lists *all* maintainers and authors of *all* versions of a package, if that package has no files on PyPI. And there are duplicate people, of course, although the emails are unique. There are also packages which have some (older) release files on pypi and newer ones outside (e.g. lockfile with 78256 downloads from code.google.com). You didn't include such in your 2651 emails, or did you? holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 13:56 +0100, Reinout van Rees wrote: On 28-02-13 10:43, holger krekel wrote: On Thu, Feb 28, 2013 at 06:38 +0100, Andreas Jung wrote: I give a shit at the arguments pulled out every time by package maintainers using PyPI only for listing their packages. I am both annoyed and bothered by these people. I didn't see such positions from package maintainers here. In fact i haven't seen anyone stepping up saying listing packages externally is a great idea. Could you point to those posts? The position Andreas probably means is projects that *do* advertise themselves on pypi, but don't put their files there. It has been an accepted practise for 10 years. I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Some arguments might be valid, but these projects *are*, taken as one group, actively breaking pip and buildout regularly. yes, and it's annoying, fully agreed. So I agree with Andreas. I don't really care about the arguments pulled out every time. Effectively actively breaking pip and buildout is bad, period. I consider it a valid concern that taking homepage/download urls away from pypi's server index is likely to break things for users. I don't see the point of doing that if we can have a better migration path by working on the installers (like is currently ongoing). Let's please not do a blackwhite discussion here and try to improve the overall situation, not just a particular aspect in a particular way. holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thursday, February 28, 2013 at 1:23 PM, PJ Eby wrote: On Thu, Feb 28, 2013 at 4:08 AM, Nick Coghlan ncogh...@gmail.com (mailto:ncogh...@gmail.com) wrote: On Thu, Feb 28, 2013 at 7:00 PM, holger krekel hol...@merlinux.eu (mailto:hol...@merlinux.eu) wrote: To summarize, having pip/easy_install report red warnings and requiring to pass a --htmlscrape=PROJ1,PROJ2 option or so is a good way to communicate, removing the ability is not, at this point. +1 I'm a fan of updating the client side tools (both upload and download) to complain if files are not hosted on PyPI, and perhaps even requiring switches or configuration settings to say yes, external downloads are OK for projects X, Y, and Z). I'm *not* a fan of changing the way PyPI handles external links, except perhaps for some of the suggestions PJE made about cleaning up some aspects of what PyPI chooses to publish for old releases. I'd prefer to leave the you can't do it any more step for the next generation secure metadata distribution infrastructure (so the installation tools will be able to fall back to the legacy infrastructure for projects that haven't updated yet). Indeed. I'm hoping that the new tools will make the old ones (e.g. setuptools) entirely irrelevant, which is why I'm hammering so hard in the PEP discussions on some use cases that eggs do well that wheels don't. I don't want people to have to keep using setuptools for those use cases. (e.g. simple plugin deployment ala Trac) If the new tools handle all of the use cases, then setuptools can die a natural death sometime in the next decade or so, so I don't have to be responsible for it when I turn old and senile. (It's already turned me grey as it is.) ;-) For the short run, I anticipate the following steps in the next release of setuptools, which I'm aiming to release before PyCon: * Default to SSL URL for PyPI * Support SSL certificate verification for downloads if the 'requests' library is available on sys.path * Update docs for easy_install to more clearly and prominently state that packages are downloaded from other sources than PyPI unless --allow-hosts is used * Add an immediate warning to each easy_install invocation (whether programmatic or command line) if --allow-hosts is not explicitly set to some value in the configuration or command line. I'm also considering adding a warning for scraping home page links, but at this point in the discussion haven't nailed down how that should work. Likewise, I'd like to provide some sort of monkeypatch to make register/upload work properly with SSL in older Pythons, but I'm not sure I can integrate cert checking there... but at least the security will be no worse than using plain distutils. (i.e., it'll still be subject to credential theft if someone MITMs PyPI) SSL checking on upload should be possible, do you want a patch? Of course, this release will initially be available as a development snapshot, i.e., made available through external links. ;-) Future releases I'm undecided about as yet, but certainly if PyPI becomes able to pull and cache externally published releases (upon a developer's request), that addresses all of my concerns on the developer-burden side, and all of the availability/security concerns on the other. Setuptools could move to a default --allow-hosts of just PyPI, as soon as that feature is available and being used. (And if the licensing issues can be worked out, old packages with external links could be pulled to PyPI anyway, and the external links removed.) ___ Catalog-SIG mailing list Catalog-SIG@python.org (mailto:Catalog-SIG@python.org) http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thursday, February 28, 2013 at 6:31 PM, PJ Eby wrote: On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft donald.stu...@gmail.com (mailto:donald.stu...@gmail.com) wrote: SSL checking on upload should be possible, do you want a patch? If it uses the 'requests' library, yes, I'll accept one. But I don't want to do any direct implementation of SSL cert checking in setuptools, at least in the short run (next few weeks), because: Does setuptools support Python3? (or do you want it to?) 1. I don't consider myself qualified as yet to write a correct patch or even verify that a contributed patch is correct/safe, and There's existing implementations out there that add cert checking to urllib, it's fairly short. 2. There is a licensing issue with including the Mozilla root certificate set in setuptools under its current license, and I'm not 100% certain I can *change* the license. (I *could* potentially use a platform-provided cert set, but that's not really an option on Windows unless you have Windows expertise above my paygrade for pulling that stuff out of the registry.) Shouldn't be any issue, the PSF license is very liberal and the MPL works on a per file (as opposed to a per project) basis. So if you include the cert bundle that particular file is MPL licensed while setuptools itself remains PSF. So, by delegating to the requests library, I can bypass both of those issues in the short term. In the longer term (1 month from now), more integrated solutions may be more feasible. Using requests is the best I think I can reasonably achieve by PyCon, but I *will* be publicizing a set of instructions for how to safely download setuptools and requests (via https in a browser to prevent MITM attacks), as well as how to configure easy_install for more secure default settings. (And easy_install will always use requests if present, unless specifically asked not to with a --no-ssl-verify option.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft donald.stu...@gmail.com wrote: SSL checking on upload should be possible, do you want a patch? If it uses the 'requests' library, yes, I'll accept one. But I don't want to do any direct implementation of SSL cert checking in setuptools, at least in the short run (next few weeks), because: 1. I don't consider myself qualified as yet to write a correct patch or even verify that a contributed patch is correct/safe, and 2. There is a licensing issue with including the Mozilla root certificate set in setuptools under its current license, and I'm not 100% certain I can *change* the license. (I *could* potentially use a platform-provided cert set, but that's not really an option on Windows unless you have Windows expertise above my paygrade for pulling that stuff out of the registry.) So, by delegating to the requests library, I can bypass both of those issues in the short term. In the longer term (1 month from now), more integrated solutions may be more feasible. Using requests is the best I think I can reasonably achieve by PyCon, but I *will* be publicizing a set of instructions for how to safely download setuptools and requests (via https in a browser to prevent MITM attacks), as well as how to configure easy_install for more secure default settings. (And easy_install will always use requests if present, unless specifically asked not to with a --no-ssl-verify option.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 8:52 PM, holger krekel hol...@merlinux.eu wrote: There are also packages which have some (older) release files on pypi and newer ones outside (e.g. lockfile with 78256 downloads from code.google.com). You didn't include such in your 2651 emails, or did you? No, I didn't, I assumed they would be quite few. Possibly a better algorithm is to check if the last release has files on PyPI. //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 10:26 AM, Donald Stufft wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. Reasoning: 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide increased uptime and better latency/throughput across the globe. 4. Privacy implications, as a user it is not particularly obvious when I run `pip install Foo` what hosts I will be able issuing requests against. It is obvious that I will be contacting PyPI and I will have made the decision to trust PyPI however it is not obvious what other hosts will be able to gather information about me, including what packages I am installing. This becomes even more difficult to determine the deeper my dependency tree goes. I fully support this. As an aside, if CDN/storage concerns are an issue, I have an outstanding offer from a large hosting company to take care of the CDN aspects for us. Jesse ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 27.02.2013 16:26, Donald Stufft wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 26 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 10:39 AM, M.-A. Lemburg wrote: -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. You also have to know and trust the hosting locations for all of them, and if they are not available via SSL you have to know and trust that there is not a MITM available. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? I did mention a method for doing that in my email. However there are reasons beyond the security ones to require packages being hosted on PyPI. Namely uptime, privacy, and performance. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 8:26 AM, Donald Stufft donald.stu...@gmail.com wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. Reasoning: 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide increased uptime and better latency/throughput across the globe. 4. Privacy implications, as a user it is not particularly obvious when I run `pip install Foo` what hosts I will be able issuing requests against. It is obvious that I will be contacting PyPI and I will have made the decision to trust PyPI however it is not obvious what other hosts will be able to gather information about me, including what packages I am installing. This becomes even more difficult to determine the deeper my dependency tree goes. 5. This is a serious PITA for package maintainers. If you accidentally upload a file somewhere else that looks like a newer version pip will install that. 6. It's a huge security hole. For someone to upload a malicious package, they just have to access some site that is crawled by pip, which includes all old download sites. If someone used to use some download domain, but they no longer own it, this is very easy for someone to upload an arbitrary malicious file with a slightly newer version number, and pip will happily install that for everyone. This was discussed at http://mail.python.org/pipermail/catalog-sig/2012-June/004518.html. My suggestion was to only download from the explicit external download link for the latest listed version, and to do so only if an upload didn't exist. At the very least, let package maintainers manually enable this behavior, so that we don't have to worry about tricking pip/easy_install into installing the right thing by version number naming (which is completely broken btw. It's impossible to name separate Python 2 and Python 3 packages so that both pip and easy_install will do the right thing in every case. See https://code.google.com/p/sympy/issues/detail?id=3511). Aaron Meurer ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 27 Feb, 2013, at 16:42, Donald Stufft donald.stu...@gmail.com wrote: On Wednesday, February 27, 2013 at 10:39 AM, M.-A. Lemburg wrote: -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. You also have to know and trust the hosting locations for all of them, and if they are not available via SSL you have to know and trust that there is not a MITM available. The security bits are still in flux, AFAIK both proposals won't require SSL for the actual download to be secure. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? I did mention a method for doing that in my email. However there are reasons beyond the security ones to require packages being hosted on PyPI. Namely uptime, privacy, and performance. You only mentioned restricting downloads to the 'Download-URL' property in the package metadata. Another alternative would be to add a PyPI API for registering specific downloads with the same restrictions on filenames as for files hosted by PyPI itself. With that PyPI could be queried for the exact downloads associated with a release instead of having to perform screen scaping. At this time I don't know if requiring that files are hosted on PyPI is a good idea, as Marc-Andre said there are reasons for hosting them elsewhere. That might change when the package signing infrastructure is further specified. Ronald P.S. And only using downloads hosted on PyPI doesn't require changes to PyPI anyway, just patches to pip and setuptools :-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
pip/easy_install into installing the right thing by version number naming (which is completely broken btw. It's impossible to name separate Python 2 and Python 3 packages so that both pip and easy_install will do the right thing in every case. See https://code.google.com/p/sympy/issues/detail?id=3511). to be clear, in this issue, easy_install is broke, but i understand you want something that works consistently across both tools. Marcus ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 11:34 AM, M.-A. Lemburg wrote: On 27.02.2013 16:42, Donald Stufft wrote: On Wednesday, February 27, 2013 at 10:39 AM, M.-A. Lemburg wrote: -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. You also have to know and trust the hosting locations for all of them, and if they are not available via SSL you have to know and trust that there is not a MITM available. Right. I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? I did mention a method for doing that in my email. However there are reasons beyond the security ones to require packages being hosted on PyPI. Namely uptime, privacy, and performance. Your proposed uploading of hash values would require listing all distribution files for each release somehow. I don't see how you'd get that to work with older Python versions. 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. Uptime and performance have in the past been one of the reasons why people chose not to upload files to PyPI. This could be changed, of course. I don't see how. If PyPI goes down then the packaging tools cannot query /simple/foo/ to see the external links. Adding in additional SPOF's only harms uptime, there is no possible way for it to increase it. Another reason for not uploading files to PyPI are the license terms you have to agree to on PyPI and the fact that you can no longer control where your distribution files are made available by agreeing to them. This could be changed as well, but we'd need to add more legalese to the PyPI mirror setup for this to work... not sure whether people providing the mirrors would like this. The legalese doesn't particularly give any more rights than any free/OSS license does. There's not a requirement currently that packages on PyPI be free/OSS but this change would only actually affect people who want to upload non free code to PyPI. Security can be had by having installers check the GPG signatures of distribution file. You don't need to trust the download site for that. GPG signatures are good, we don't have them yet. And when we do it's only 1 layer of defense, not the final solution. I'm not sure what you meant with privacy in this context. If I download something from server there is a certain amount of information that by nature of HTTP and networking gets leaked to that host. Additionally if it's done via non TLS connections it also gets leaked to anyone who has a MITM on my connection. This is especially important in countries where the government actively surveils or modifies the traffic of their citizens. Something that would work even with older Python versions is letting the download URL point to a meta-file which contains the links to the other distribution files. That way you avoid having the installers trying to parse arbitrary websites and you can add more security to the downloads by providing hash values, etc. in those meta-files. Since installers already know how to parse the /simple/ (HTML) index files, we might use that same format for those meta-files. -- Marc-Andre Lemburg eGenix.com (http://eGenix.com) Professional Python Services directly from the Source (#1, Feb 26 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 27.02.2013 17:43, Donald Stufft wrote: On Wednesday, February 27, 2013 at 11:34 AM, M.-A. Lemburg wrote: On 27.02.2013 16:42, Donald Stufft wrote: On Wednesday, February 27, 2013 at 10:39 AM, M.-A. Lemburg wrote: -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. You also have to know and trust the hosting locations for all of them, and if they are not available via SSL you have to know and trust that there is not a MITM available. Right. I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? I did mention a method for doing that in my email. However there are reasons beyond the security ones to require packages being hosted on PyPI. Namely uptime, privacy, and performance. Your proposed uploading of hash values would require listing all distribution files for each release somehow. I don't see how you'd get that to work with older Python versions. 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. Uptime and performance have in the past been one of the reasons why people chose not to upload files to PyPI. This could be changed, of course. I don't see how. If PyPI goes down then the packaging tools cannot query /simple/foo/ to see the external links. Adding in additional SPOF's only harms uptime, there is no possible way for it to increase it. Package installers only need access to the static files in the /simple/ index. Those can be put behind a CDN to increase uptime. PyPI itself doesn't have to be up and running if you just want to download the files (unfortunately, that's not true at the moment, because the /simple/ index is dynamically generated, but that can be changed). See http://wiki.python.org/moin/CloudPyPI for details. Another reason for not uploading files to PyPI are the license terms you have to agree to on PyPI and the fact that you can no longer control where your distribution files are made available by agreeing to them. This could be changed as well, but we'd need to add more legalese to the PyPI mirror setup for this to work... not sure whether people providing the mirrors would like this. The legalese doesn't particularly give any more rights than any free/OSS license does. There's not a requirement currently that packages on PyPI be free/OSS but this change would only actually affect people who want to upload non free code to PyPI. It does affect any package author, regardless of the license. Some examples: * you may be forced remove a distribution from the net (think DMCA, patents, trademarks, etc) * the distribution may contain a serious bug that you don't want to spread * you may want to keep more accurate statistics of the reach of your project Security can be had by having installers check the GPG signatures of distribution file. You don't need to trust the download site for that. GPG signatures are good, we don't have them yet. And when we do it's only 1 layer of defense, not the final solution. Sure, you still have to trust the author :-) I'm not sure what you meant with privacy in this context. If I download something from server there is a certain amount of information that by nature of HTTP and networking gets leaked to that host. Additionally if it's done via non TLS connections it also gets leaked to anyone who has a MITM on my connection. This is especially important in countries where the government actively surveils or modifies the traffic of their citizens. I can see an issue with e.g. trying to download code that is illegal to use in a country (e.g. crypto code, exploits, hacks, etc.), but the country officials would probably just block the complete PyPI site than bother with filtering single requests. IMO, that's beyond the scope of what we're discussing here, though. Something that would work even with older Python versions is letting the download URL point to a meta-file which contains the links to the other distribution files. That way you avoid having the installers trying to parse arbitrary websites and you can add more security to the downloads by
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 12:10 PM, M.-A. Lemburg wrote: Package installers only need access to the static files in the /simple/ index. Those can be put behind a CDN to increase uptime. PyPI itself doesn't have to be up and running if you just want to download the files (unfortunately, that's not true at the moment, because the /simple/ index is dynamically generated, but that can be changed). See http://wiki.python.org/moin/CloudPyPI for details. I'm aware of that, but that doesn't change the statement. If /simple/ is down you cannot determine the external urls. There is no way to increase uptime by adding more points of failure. Another reason for not uploading files to PyPI are the license terms you have to agree to on PyPI and the fact that you can no longer control where your distribution files are made available by agreeing to them. This could be changed as well, but we'd need to add more legalese to the PyPI mirror setup for this to work... not sure whether people providing the mirrors would like this. The legalese doesn't particularly give any more rights than any free/OSS license does. There's not a requirement currently that packages on PyPI be free/OSS but this change would only actually affect people who want to upload non free code to PyPI. It does affect any package author, regardless of the license. Some examples: * you may be forced remove a distribution from the net (think DMCA, patents, trademarks, etc) IANAL but I'm pretty sure if any of those things occur you didn't have the legal right to grant that license to the PSF and the PSF would be required to take them down anyways. * the distribution may contain a serious bug that you don't want to spread This is a completely separate issue. PyPI supports (and always will) a method of saying delete and/or don't install this. This is really just a strawman. * you may want to keep more accurate statistics of the reach of your project What statistics do you want? Let's have PyPI produce them and properly anonymize them instead of leaking data. Security can be had by having installers check the GPG signatures of distribution file. You don't need to trust the download site for that. GPG signatures are good, we don't have them yet. And when we do it's only 1 layer of defense, not the final solution. Sure, you still have to trust the author :-) But do I need to trust his host? Do I need to trust that his laptop didn't get swiped and with it his GPG key? Ideally I don't *need* to trust the author either. I download his package from PyPI and I can review it, then I know it's fine and I can download that version and use it. PyPI isn't to the point you can make that assumption but It should get there. I'm not sure what you meant with privacy in this context. If I download something from server there is a certain amount of information that by nature of HTTP and networking gets leaked to that host. Additionally if it's done via non TLS connections it also gets leaked to anyone who has a MITM on my connection. This is especially important in countries where the government actively surveils or modifies the traffic of their citizens. I can see an issue with e.g. trying to download code that is illegal to use in a country (e.g. crypto code, exploits, hacks, etc.), but the country officials would probably just block the complete PyPI site than bother with filtering single requests. IMO, that's beyond the scope of what we're discussing here, though. It's not just crypto code, exploits, hacks it's also things like https://ooni.torproject.org/ and Tor itself which are *good* projects that certain governments might not particularly like. Something that would work even with older Python versions is letting the download URL point to a meta-file which contains the links to the other distribution files. That way you avoid having the installers trying to parse arbitrary websites and you can add more security to the downloads by providing hash values, etc. in those meta-files. Since installers already know how to parse the /simple/ (HTML) index files, we might use that same format for those meta-files. So what do you think of the above idea ? If the hashes is on the external system then they are as good as useless. If I'm able to do something nefarious with the packages that are hosted I can do something nefarious with the metadata file. Putting the hashes on PyPI fixes the security issue (because we have a real SSL cert, and tools are starting to validate it) but doesn't fix the other issues. -- Marc-Andre Lemburg eGenix.com (http://eGenix.com) Professional Python Services directly from the Source (#1, Feb 26 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ...
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 12:22 PM, holger krekel wrote: The main means of securing against tampering is author-signatures and verification by installers. If we have that, the download location does not matter (pypi/CDN/google/...). Again we don't have that yet, It's only 1 layer, and that doesn't solve all of the issues with external packages. 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). There are many links which go to google, bitbucket or github - i doubt those services have worse availability than pypi.python.org (http://pypi.python.org), rather better. Doesn't matter if they have worse or better, you cannot increase availability by adding more points of failure, at best you keep it the same, typically you decrease it. Also we would be loosing a lot of packages because i expect there to be a non-trivial amount of packages which will not be transferred to pypi.python.org (http://pypi.python.org) no matter how much people here think it's cool. Why not first have an a good infrastructure and capacity with pypi.python.org (http://pypi.python.org) so that people *want* to move their files there? PyPI has had very good uptime since the move to OSL. I don't have numbers handy but I believe I can get them. best, holger 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide increased uptime and better latency/throughput across the globe. 4. Privacy implications, as a user it is not particularly obvious when I run `pip install Foo` what hosts I will be able issuing requests against. It is obvious that I will be contacting PyPI and I will have made the decision to trust PyPI however it is not obvious what other hosts will be able to gather information about me, including what packages I am installing. This becomes even more difficult to determine the deeper my dependency tree goes. ___ Catalog-SIG mailing list Catalog-SIG@python.org (mailto:Catalog-SIG@python.org) http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 10:22 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 10:26 -0500, Donald Stufft wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. Reasoning: 1. It is difficult to secure the process of spidering external links for download. 1a. The only way I can think offhand is by requiring uploading a hash of the expected files to PyPI along with the download link and removing all urls except for the download_url. This has the effect that only 1 file can be associated with a particular release. The main means of securing against tampering is author-signatures and verification by installers. If we have that, the download location does not matter (pypi/CDN/google/...). 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). There are many links which go to google, bitbucket or github - i doubt those services have worse availability than pypi.python.org, rather better. Also we would be loosing a lot of packages because i expect there to be a non-trivial amount of packages which will not be transferred to pypi.python.org no matter how much people here think it's cool. Why not first have an a good infrastructure and capacity with pypi.python.org so that people *want* to move their files there? If you change the policy to also download links, but only official links actually manually put there by the package maintainer, no crawling, isn't it fair to say, if you want pip to install your package, you need to tell PyPI where it is, explicitly. And if you release a new version, you need to tell PyPI about that new version, or else it will continue to install the old version. I suppose they could also just have a link to latest tarball if they really want to be lazy. PyPI/pip are not like Linux package systems. They should have no prerogative to always try to get the latest version without any work by the package maintainer, especially since there's not a team of people who do it: the whole thing happens automatically by some heuristics. Aaron Meurer best, holger 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide increased uptime and better latency/throughput across the globe. 4. Privacy implications, as a user it is not particularly obvious when I run `pip install Foo` what hosts I will be able issuing requests against. It is obvious that I will be contacting PyPI and I will have made the decision to trust PyPI however it is not obvious what other hosts will be able to gather information about me, including what packages I am installing. This becomes even more difficult to determine the deeper my dependency tree goes. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). There are many links which go to google, bitbucket or github - i doubt those services have worse availability than pypi.python.org (http://pypi.python.org), rather better. Also we would be loosing a lot of packages because i expect there to be a non-trivial amount of packages which will not be transferred to pypi.python.org (http://pypi.python.org) no matter how much people here think it's cool. Why not first have an a good infrastructure and capacity with pypi.python.org (http://pypi.python.org) so that people *want* to move their files there? best, holger Ok, so we have that. What now? ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
Having different sources for package metadata does pose security concerns, for example version mismatch attacks by a MITM. Unless we co-locate all package metadata at a single source that is trusted for protecting against these issues, this will be an issue.(However, possibly not the biggest threat right now.) I do believe that if you do centralize metadata, you could outsource mirroring the data if desired without losing the other security goals you have. Thanks, Justin On Wed, Feb 27, 2013 at 10:39 AM, M.-A. Lemburg m...@egenix.com wrote: On 27.02.2013 16:26, Donald Stufft wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 26 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
Which in particular means that metadata needs to come from PyPI itself, not from the tarball file name. Aaron Meurer On Feb 27, 2013, at 11:06 AM, Justin Cappos jcap...@poly.edu wrote: Having different sources for package metadata does pose security concerns, for example version mismatch attacks by a MITM. Unless we co-locate all package metadata at a single source that is trusted for protecting against these issues, this will be an issue.(However, possibly not the biggest threat right now.) I do believe that if you do centralize metadata, you could outsource mirroring the data if desired without losing the other security goals you have. Thanks, Justin On Wed, Feb 27, 2013 at 10:39 AM, M.-A. Lemburg m...@egenix.com wrote: On 27.02.2013 16:26, Donald Stufft wrote: PyPI is now being served with a valid SSL certificate, and the tooling has begun to incorporate SSL verification of PyPI into the process. This is _excellent_ and the parties involved should all be thanked. However there is still another massive area of insecurity within the packaging tool chain. For those who don't know, when you attempt to install a particular package a number of urls are visited. The steps look roughly something like this: 1. Visit http://pypi.python.org/simple/Package/ and attempt to collect any links that look like it's installable (tarballs, #egg=, etc). Note: /simple/Package/ contains download_url, home_page, and any link that is contained in the long_description). 2. Visit any link referenced as home_page and attempt to collect any links that look like it's installable. 3. Visit any link referenced in a dependency_links and attempt to collect any links that look like it's installable. 4. Take all of the collected links and determine which one best matches the requirement spec given and download it. 5. Rinse and repeat for every dependency in the requirement set. I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. -1. There are many reasons for not hosting packages and distributions on PyPI itself. If you use and trust a package, you also have to know and trust its dependencies, no matter where they are hosted, so you're not gaining any security by disabling links to other download locations: if you don't trust the way a package is hosted, you don't use it; if you do, then removing the link breaks the package and all its dependencies. Instead of suggesting to removing support for externally hosted packages, why not propose a mechanism to provide a more direct/secure way to reference them ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 26 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 9:28 AM, M.-A. Lemburg wrote: On 27.02.2013 18:05, Noah Kantrowitz wrote: M.-A. Lemburg m...@egenix.com wrote: I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. Ideally in some number of months (1? 2?) we would turn off adding these links from new releases, leaving the existing ones intact and then a few months later the existing links be removed completely. -1. There are many reasons for not hosting packages and distributions on PyPI itself. [citation needed] We've been through this discussion a couple of times in the past. I'm sure the reasons will get listed again in this discussion :-) Too many distribution files for PyPI to handle, Again, please point at a specific package. I wasn't aware that PyPI limited uploads at all, but if it does we can certainly increase the number if there is a good reason. no support for UCS2/UCS4 binary distributions, unsupported distribution file formats (e.g. our prebuilt format), Not sure why PyPI would even care what charset the package files use, but if true thats certainly a bug and we can get that fixed. What file formats do pip/buildout support that PyPI doesn't support for uploads? giving up control are some of them. This is the point of running a package server, the author gives up control over distribution in order to reap the benefits of solid infrastructure and discoverability. This is a feature. The legal restrictions on code on pypi itself is nothing more than needed to let people actually install things, which is kind of the point of listing on pypi. If someone really wants their own universe, run a package server yourself. What other reasons are there? Agreeing to an extra license would block pip anyway, so no loss there. Huge package files maybe? That's not quite true: http://www.python.org/about/legal/ ... third party content providers grant the PSF and all other users of the web site an irrevocable, worldwide, royalty-free, nonexclusive license to reproduce, distribute, transmit, display, perform, and publish such content, including in digital form. Once you upload the files to PyPI, you completely give up control, because that license is irrevocable. This goes far beyond what the Python license does: http://docs.python.org/2/license.html Changing the PyPI terms to be more author-friendly is on my agenda, but I haven't found the time for that particular discussion yet ;-) You are comparing an artifact distribution requirement with a source code license. PyPI's terms don't say a thing about source code or anything else, just that if you want a package file to be installable, we need to be able to send it to people. There is nothing even remotely author unfriendly here, it is just common sense. Again, PyPI is _not_ the only way to publish packages, we are allowed to expect interoperability from people that choose to participate in our community. --Noah signature.asc Description: Message signed with OpenPGP using GPGMail ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
Il giorno 27/feb/2013, alle ore 19:23, Donald Stufft donald.stu...@gmail.com ha scritto: On Wednesday, February 27, 2013 at 12:44 PM, Donald Stufft wrote: Why not first have an a good infrastructure and capacity with pypi.python.org so that people *want* to move their files there? PyPI has had very good uptime since the move to OSL. I don't have numbers handy but I believe I can get them. I got the numbers! Since almost a year ago (This was setup at the last US PyCon): Uptime: 99.99% Downtime: 6h 58m Number of Downtimes: 126 I want to stress again that even if that was a poor number that adding more points of failure only decrease the expected uptime, or at best does nothing. In fact, adding a caching CDN in front of PyPI (instead of the current mirror protocol) would probably bring the uptime close to 100% for people downloading packages via pip. I'm +1 on dropping the current (complicated) mirror system and external links, and in favor of centralizing everything into PyPI, plus a third-party CDN / hosting service. In fact, Python is a big-enough brand name that we could even get a CDN service almost for free in exchange of an acknowledge of the CDN company being used. -- Giovanni Bajo :: ra...@develer.com Develer S.r.l. :: http://www.develer.com My Blog: http://giovanni.bajo.it smime.p7s Description: S/MIME cryptographic signature ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 1:32 PM, Giovanni Bajo wrote: In fact, Python is a big-enough brand name that we could even get a CDN service almost for free in exchange of an acknowledge of the CDN company being used. As far as I know this has already have been offered in some form to Python. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 1:34 PM, holger krekel wrote: On Wed, Feb 27, 2013 at 13:00 -0500, Jesse Noller wrote: 2. External links decrease the expected uptime for a particular set of requirements. PyPI itself has become very stable, however the same cannot be said for all of the hosts linked that the toolchain processes. Each new host is an additional SPOF. Ex: I depend on PyPI and 10 other external packages, each service has a 99% uptime so my expected uptime to be able to install all my requirements would be ~89% (0.99 ** 11). There are many links which go to google, bitbucket or github - i doubt those services have worse availability than pypi.python.org (http://pypi.python.org), rather better. Also we would be loosing a lot of packages because i expect there to be a non-trivial amount of packages which will not be transferred to pypi.python.org (http://pypi.python.org) no matter how much people here think it's cool. Why not first have an a good infrastructure and capacity with pypi.python.org (http://pypi.python.org) so that people *want* to move their files there? best, holger Ok, so we have that. What now? I am not sure i understand. Just last week there were many installs going wrong - installs failing due to the http/https redirecting. This same problem would have affected external urls as well because you cannot install something with having first contacted PyPI. I've got at least 3 occassions myself in the last months where i couldn't use pypi.python.org (http://pypi.python.org) and i've heart similar things from other people. Couldn't Use pypi.python.org is very vague. I hit PyPI every 15 seconds or so and rarely have issues. Lately when there have been installation problems it's been due to external services being down. For example Mercurial has recently been having problems because they don't host their packages on PyPI and their website has had downtime issues lately. There is also the issue that it's not clear we could just put all packages from download locations to pypi.python.org (http://pypi.python.org) due to sizing constraints - at least that is what i got from discussions here earlier. If a package is too large for PyPI that is a solvable problem, the current limit exists for a sanity check, not for any hard technical reason. holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 1:33 PM, Donald Stufft wrote: On Wednesday, February 27, 2013 at 1:32 PM, Giovanni Bajo wrote: In fact, Python is a big-enough brand name that we could even get a CDN service almost for free in exchange of an acknowledge of the CDN company being used. As far as I know this has already have been offered in some form to Python. Yup, by like, 2 or 3 hosting companies. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. Aaron Meurer ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On a general note: It really warms my heart to see that people are warming up to the idea of using CDN's and getting rid of external downloads. I'm all for that. //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 2:47 PM, Lennart Regebro wrote: On a general note: It really warms my heart to see that people are warming up to the idea of using CDN's and getting rid of external downloads. I'm all for that. Excellent. So it's a date! ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 11:47 AM, Lennart Regebro wrote: On a general note: It really warms my heart to see that people are warming up to the idea of using CDN's and getting rid of external downloads. I'm all for that. Just to be clear on this point 1) Moving PyPI and other PSF properties behind a caching CDN will be happening, just haven't had the cycles but the foundation has been laid 2) Moving PyPI to use cloud storage as its primary backing store (S3, Swift, etc) is not really determined, we might opt to move it to using a local Gluster or Ceph cluster instead and do the origin serving ourselves since it matters much less in light of #1 3) Most importantly, this has absolutely nothing to do with the current discussion --Noah ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
Would it be wrong to ask for a /complex API at the same time? The simple api, with 28k package names on one page, is getting a little silly. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 2:56 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 12:49 PM, Monty Taylor mord...@inaugust.com (mailto:mord...@inaugust.com) wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu (mailto:hol...@merlinux.eu) wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com (mailto:m...@egenix.com) wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. How would you remove it from PyPI itself? Would that just require changing some urls, so that pip doesn't know where to find stuff any more? Modify the PyPI software to no longer link to those urls. Sorry if this is obvious. I'm not a pip/PyPI developer. Just a package maintainer who has been irked several times by pip's/PyPI's/easy_install's idiotic external links policy. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 1:01 PM, Donald Stufft donald.stu...@gmail.com wrote: On Wednesday, February 27, 2013 at 2:56 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 12:49 PM, Monty Taylor mord...@inaugust.com wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. How would you remove it from PyPI itself? Would that just require changing some urls, so that pip doesn't know where to find stuff any more? Modify the PyPI software to no longer link to those urls. Right. As I was saying, this would break any other tools that might use those urls, perhaps for less nefarious purposes. But then again, that's somewhat speculative. If someone can point out something that uses them, that will be something to consider, but for now, the main thing we know uses it is pip (and easy_install), and the whole point is to break them. Aaron Meurer Sorry if this is obvious. I'm not a pip/PyPI developer. Just a package maintainer who has been irked several times by pip's/PyPI's/easy_install's idiotic external links policy. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 3:08 PM, Aaron Meurer asmeu...@gmail.com wrote: On Feb 27, 2013, at 1:01 PM, Donald Stufft donald.stu...@gmail.com wrote: On Wednesday, February 27, 2013 at 2:56 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 12:49 PM, Monty Taylor mord...@inaugust.com wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. How would you remove it from PyPI itself? Would that just require changing some urls, so that pip doesn't know where to find stuff any more? Modify the PyPI software to no longer link to those urls. Right. As I was saying, this would break any other tools that might use those urls, perhaps for less nefarious purposes. But then again, that's somewhat speculative. If someone can point out something that uses them, that will be something to consider, but for now, the main thing we know uses it is pip (and easy_install), and the whole point is to break them. Aaron Meurer Sorry if this is obvious. I'm not a pip/PyPI developer. Just a package maintainer who has been irked several times by pip's/PyPI's/easy_install's idiotic external links policy. Or just expose a new no external links API the same as the simple API (pretty sure crate offers this) that will be the default in the next release of pip, giving people a little more control over when their packaging tool breaks. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 14:49 -0500, Monty Taylor wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. I can see it's tempting to just try to force everyone to upload their stuff to pypi.python.org. I am very skeptical about this approach. There already is a high frustration with the packaging ecology in Python. When we remove external links on the server side, installs for many people and companies are going to break, no matter what. And they would have no client-side switch anymore to make things working. Requiring to use older setuptools/pip versions would not help because the server information is gone. That'd just increase frustration. So at the very least using external links needs to be a client-side installer choice for a long while and the server needs to offer the according information. I'd generally prefer to think hard about ways to improve the situation without breaking things. Putting simple/ and packaging serving on a CDN is one such step and a good idea i think. Establishing a signing/verification mechanism is another. Refining py2/py3 dependency discovery yet another good one. best, holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 12:16 PM, holger krekel wrote: On Wed, Feb 27, 2013 at 14:49 -0500, Monty Taylor wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. And of course, nobody is forcing anyone, just like nobody is forcing you to use PyPI. :-) I understood there is the idea to disable external links within a couple of months. That does break backward compatibility in a considerable way. holger But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. I can see it's tempting to just try to force everyone to upload their stuff to pypi.python.org. I am very skeptical about this approach. There already is a high frustration with the packaging ecology in Python. When we remove external links on the server side, installs for many people and companies are going to break, no matter what. And they would have no client-side switch anymore to make things working. Requiring to use older setuptools/pip versions would not help because the server information is gone. That'd just increase frustration. So at the very least using external links needs to be a client-side installer choice for a long while and the server needs to offer the according information. I'd generally prefer to think hard about ways to improve the situation without breaking things. Putting simple/ and packaging serving on a CDN is one such step and a good idea i think. Establishing a signing/verification mechanism is another. Refining py2/py3 dependency discovery yet another good one. None of these things have anything to do with improving _this issue_, though they would make PyPI better and will be tackled at some point. This is a feature that must be removed if we are going to operate a trustable packaging network. Today, tomorrow, or six months from now, but this feature will be going away, the only question is how do we get there? Yes things will break. We also broke old users of pypissh a few weeks ago as part of the SSL lockdown, this is an acceptable loss as deprecation schedules were made and followed. We will not randomly disable these links today, as you said the right first move will be to show a warning (and then an error) in pip/buildout when using these links. Donald has already begun that conversation with each of the tool developers. We will need a global plan though, an overarching schedule to work with pip and buildout (and easy_install if someone does the legwork there) about how to announce this removal and how to ensure we break as few people as possible over the course of the plan. That is what this discussion is about. --Noah signature.asc Description: Message signed with OpenPGP using GPGMail ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
As far as I'm concerned, pip is broke too, in the sense that the method we use to make pip work in Python 3 is a bit of an annoying hack (namely, upload a separate tarball for each minor Python 3 version). I agree it's a hack. but only =1.2 package metadata supports requires-python and nothing is writing that now (except for wheel). if newer metadata were pervasive and available on pypi, pip could respond to it. I think it would probably automatically start showing up in the json and xml interfaces? but would require some changes to expose an html attribute for the simple interface, which pip currently uses. Marcus ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 28, 2013 2:26 AM, Donald Stufft donald.stu...@gmail.com wrote: I propose we deprecate the external links that PyPI has published on the /simple/ indexes which exist because of the history of PyPI. +1 ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 3:27 PM, Donald Stufft donald.stu...@gmail.comwrote: I'm not asking for this to be shutoff immediately, it will be phased, particularly so project maintainers can be made aware that it's going away and can upload versions to PyPI to prevent this kind of wide spread breakage. Particularly the first phase I outlined for PyPI was to disable _new_ links from being added to the /simple/ pages and keeping the old around. So that _old_ releases still work for now, but _new_ ones do not. +1 Here is the critical bit. *new releases*. There is no extra work for package managers until a new release is made. I think most package managers would rather adjust their processes to ensure that users of the package can accesses it securely and reliably. It is much easier to concentrate work on the reliability of PyPI than to 100s of individual sites hosting packages that at this point likely don't even have SSL. I think most users would rather get the packages from PyPI infrastructure and as was already posted, new users probably don't realize that pip/easy_install hits external dependencies. -Chris -- Christopher Lambacher ch...@kateandchris.net ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, and a deprecation warning to pip and easy_install and after some pre-determined time remove the feature from easy_install and pip. folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI Yes there will be: Everyone mailing them to tell them there software is broken and can't be installed with easy_install and pip. That's going to be very annoying very fast. ;-) //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 02/27/2013 04:04 PM, Lennart Regebro wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, and a deprecation warning to pip and easy_install and after some pre-determined time remove the feature from easy_install and pip. Good point. folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI Yes there will be: Everyone mailing them to tell them there software is broken and can't be installed with easy_install and pip. That's going to be very annoying very fast. ;-) ++ We could also write an easy utility that a maintainer could run on their project like: suck_in my_package Which would query current pypi for a list of available releases of my_package, then post them as a direct upload to pypi and finally remove the external link. That way, once someone annoys them, there's an easy answer of how to migrate. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 1:34 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. I haven't seen anybody mention it yet, but checkouts of development versions are a use case that can't currently be addressed without support for multiple external links. For example, setuptools itself offers SVN checkout URLs for two different branches. I've also seen in-development packages offered via github or bitbucket checkouts as well. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 9:01 PM, Donald Stufft donald.stu...@gmail.com wrote: Modify the PyPI software to no longer link to those urls. Well, I guess we can remove the software home page and the download URL's from the simple index. For example, in PIL's case the simple index looks like this: 1.1.5a1 home_page 1.1.5a1 download_url 1.1.4 home_page 1.1.5 home_page 1.1.5 download_url 1.1.5a2 home_page 1.1.5a2 download_url 1.1.3 home_page 1.1.3 download_url 1.1.6 home_page 1.1.6 download_url (Each of those is a link) That result in the following actions from easy_install, where Process url: means it looks at the URL to see if it is a distribution package, or if it is HTML, if that page possibly contains links that could be a distribution package, and Found link: means that it found a distribution package. Process url: http://pypi.python.org/simple/PIL/ Process url: http://www.pythonware.com/products/pil Process url: http://effbot.org/zone/pil-changes-115.htm Process url: http://www.pythonware.com/products/pil/ Process url: http://www.pythonware.com/products/pil Process url: http://effbot.org/zone/pil-changes-115.htm Process url: http://www.pythonware.com/products/pil Process url: http://effbot.org/zone/pil-changes-115.htm Process url: http://www.pythonware.com/products/pil/ Process url: http://www.pythonware.com/downloads/Imaging-1.1.3.tar.gz Found link: http://www.pythonware.com/downloads/Imaging-1.1.3.tar.gz Process url: http://www.pythonware.com/products/pil Process url: http://effbot.org/downloads/#Imaging Process url: http://www.pythonware.com/products/pil Reading http://www.pythonware.com/products/pil Process url: http://www.pythonware.com/media/css/pythonware.css Process url: http://www.pythonware.com/index.htm Process url: http://www.pythonware.com/products/index.htm Process url: http://www.pythonware.com/library/index.htm Process url: http://www.pythonware.com/search.htm Process url: http://www.pythonware.com/daily/index.htm Process url: http://www.pythonware.com/products/ Process url: http://www.pythonware.com/products/pil/support.htm Process url: http://www.pythonware.com/products/pil/old.htm Process url: http://www.pythonware.com/products/pil/license.htm Process url: http://www.pythonware.com/products/pil/faq.htm Process url: http://www.djangoproject.com/ Process url: http://www.pythonware.com/products/pil/license.htm Process url: http://www.pythonware.com/products/pil/#pil117 Process url: mailto:image-...@python.org Process url: http://mail.python.org/mailman/listinfo/image-sig Process url: mailto:image-sig-requ...@python.org Process url: http://effbot.org/downloads/Imaging-1.1.7.tar.gz Found link: http://effbot.org/downloads/Imaging-1.1.7.tar.gz Process url: http://effbot.org/downloads/PIL-1.1.7.win32-py2.4.exe Found link: http://effbot.org/downloads/PIL-1.1.7.win32-py2.4.exe Process url: http://effbot.org/downloads/PIL-1.1.7.win32-py2.5.exe Found link: http://effbot.org/downloads/PIL-1.1.7.win32-py2.5.exe Process url: http://effbot.org/downloads/PIL-1.1.7.win32-py2.6.exe Found link: http://effbot.org/downloads/PIL-1.1.7.win32-py2.6.exe Process url: http://effbot.org/downloads/PIL-1.1.7.win32-py2.7.exe Found link: http://effbot.org/downloads/PIL-1.1.7.win32-py2.7.exe Process url: http://effbot.org/downloads#pil Process url: http://effbot.org/downloads/Imaging-1.1.6.tar.gz Found link: http://effbot.org/downloads/Imaging-1.1.6.tar.gz Process url: http://effbot.org/downloads/PIL-1.1.6.win32-py2.2.exe Found link: http://effbot.org/downloads/PIL-1.1.6.win32-py2.2.exe Process url: http://effbot.org/downloads/PIL-1.1.6.win32-py2.3.exe Found link: http://effbot.org/downloads/PIL-1.1.6.win32-py2.3.exe Process url: http://effbot.org/downloads/PIL-1.1.6.win32-py2.4.exe Found link: http://effbot.org/downloads/PIL-1.1.6.win32-py2.4.exe Process url: http://effbot.org/downloads/PIL-1.1.6.win32-py2.5.exe Found link: http://effbot.org/downloads/PIL-1.1.6.win32-py2.5.exe Process url: http://effbot.org/downloads/PIL-1.1.6.win32-py2.6.exe Found link: http://effbot.org/downloads/PIL-1.1.6.win32-py2.6.exe Process url: http://effbot.org/zone/pil-changes-116.htm Process url: http://effbot.org/zone/python-register.htm Process url: http://effbot.org/downloads/Imaging-1.1.5.tar.gz Found link: http://effbot.org/downloads/Imaging-1.1.5.tar.gz Process url: http://effbot.org/downloads/PIL-1.1.5.win32-py2.1.exe Found link: http://effbot.org/downloads/PIL-1.1.5.win32-py2.1.exe Process url: http://effbot.org/downloads/PIL-1.1.5.win32-py2.2.exe Found link: http://effbot.org/downloads/PIL-1.1.5.win32-py2.2.exe Process url: http://effbot.org/downloads/PIL-1.1.5.win32-py2.3.exe Found link: http://effbot.org/downloads/PIL-1.1.5.win32-py2.3.exe Process url: http://effbot.org/downloads/PIL-1.1.5.win32-py2.4.exe Found link: http://effbot.org/downloads/PIL-1.1.5.win32-py2.4.exe Process url: http://effbot.org/downloads/PIL-1.1.5.win32-py2.5.exe Found link: http://effbot.org/downloads/PIL-1.1.5.win32-py2.5.exe Process url:
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, So far, I don't think anybody's talking to the right we for stopping it. It's the tools that control this, not PyPI. (PyPI can't actually stop the tools from using this information without also making itself a lot less useful to *humans* at the same time.) As far as my personal position on the matter, I think that it's reasonable to deprecate the scraping of home page and download links. As somebody pointed out, expired domains are a potentially nasty problem there. OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. (Not to mention the newer projects that host directly from revision control.) Given that easy_install was deliberately designed so that those guys would *not* need to change their hosting strategies to get automated downloads, I'd like to see more talk about how we're going to help people change their releasing and hosting strategies. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 10:17 PM, PJ Eby p...@telecommunity.com wrote: I haven't seen anybody mention it yet, but checkouts of development versions are a use case that can't currently be addressed without support for multiple external links. For example, setuptools itself offers SVN checkout URLs for two different branches. I've also seen in-development packages offered via github or bitbucket checkouts as well. These versions should not be installed unless the installer is explicitly told to install just those versions, so that is really not connected to this issue. You should of course be able to install files both locally and from a specific URL. But the development tgz created and hosted on github should IMO never be installed by just saying easy_install frobnitz or even easy_install frobnitz==1.3.4dev5 //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Feb 27, 2013, at 1:31 PM, PJ Eby wrote: On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, So far, I don't think anybody's talking to the right we for stopping it. It's the tools that control this, not PyPI. (PyPI can't actually stop the tools from using this information without also making itself a lot less useful to *humans* at the same time.) As far as my personal position on the matter, I think that it's reasonable to deprecate the scraping of home page and download links. As somebody pointed out, expired domains are a potentially nasty problem there. OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. (Not to mention the newer projects that host directly from revision control.) Given that easy_install was deliberately designed so that those guys would *not* need to change their hosting strategies to get automated downloads, I'd like to see more talk about how we're going to help people change their releasing and hosting strategies. To be honest, either they will adapt or replacements will arise (see also: Pillow). PIL is a great example of something that can and _should_ be completely broken since it is already 90% broken anyway. --Noah signature.asc Description: Message signed with OpenPGP using GPGMail ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 4:17 PM, PJ Eby wrote: On Wed, Feb 27, 2013 at 1:34 PM, Lennart Regebro rege...@gmail.com (mailto:rege...@gmail.com) wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com (mailto:m...@egenix.com) wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. I haven't seen anybody mention it yet, but checkouts of development versions are a use case that can't currently be addressed without support for multiple external links. For example, setuptools itself offers SVN checkout URLs for two different branches. I've also seen in-development packages offered via github or bitbucket checkouts as well. Is this http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev and http://svn.python.org/projects/sandbox/branches/setuptools-0.6/#egg=setuptools-dev06 ? I don't think they belong on the main repo page. Not every project supports this, and the ones that do use varying names, is there anything wrong with just updating your instructions to say instead of (please replace with easy_install lingo here) `pip install setuptools==setuptools-dev` please `pip install -e http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev` ? Alternatively if the extra typing is really not desired then I'd say let's add a separate method (/dev/setuptools/ for example?) that only links these external development urls. And update the tooling to check there via a --dev flag or something. I still don't think needing to specify the full url is a terrible burden though. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 27 lut 2013, at 21:16, holger krekel hol...@merlinux.eu wrote: On Wed, Feb 27, 2013 at 14:49 -0500, Monty Taylor wrote: On 02/27/2013 02:47 PM, Aaron Meurer wrote: If we don't remove the feature from pypi itself, then it won't help the folks for whom its a problem, because there will be no incentive for the folks hosting their software that way to actually upload their stuff to PyPI - which means that client-side disabling of external_links is fairly likely to never be usable. I can see it's tempting to just try to force everyone to upload their stuff to pypi.python.org. I am very skeptical about this approach. I can totally understand why users would want to force maintainers to upload stuff to pypi.python.org after another failed build caused by a dependency on third-party infrastructure. While our package index is not perfect, lately it seems the main problem is with external packages. There already is a high frustration with the packaging ecology in Python. When we remove external links on the server side, installs for many people and companies are going to break, no matter what. As Donald points out, we would only do this for new releases. This would break no existing releases for users. Speaking of frustration and breakage though, let's say Mercurial or python-memcached isn't available because their website is down. Where can you go? Unless you have a pip-cached copy, the answer is too often nowhere. -- Best regards, Łukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 10:31 PM, PJ Eby p...@telecommunity.com wrote: Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. You can upload files, but not create new releases. But that seems like a pretty minor addition, or? Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. I know I probably have tunnel vision here, but I'm not sure what the issues are. :-) //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 4:31 PM, PJ Eby wrote: So far, I don't think anybody's talking to the right we for stopping it. It's the tools that control this, not PyPI. (PyPI can't actually stop the tools from using this information without also making itself a lot less useful to *humans* at the same time.) I have issues out for pip and buildout, didn't have time to find and make issues for setuptools and distribute but I plan on doing that as well. However PyPI _can_ stop publish that info on the simple index. If tooling wants to go out of their way to scrape the human pages that's their problem and would be unsupported. By not publishing that content we make a clear line of what is and isn't supported for the tooling to use. As far as my personal position on the matter, I think that it's reasonable to deprecate the scraping of home page and download links. As somebody pointed out, expired domains are a potentially nasty problem there. OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Development snapshots are a use case that i'm not sure makes sense for PyPI, but if they do should require specific opt-in to install them. Does easy_install have a command line flag that adds extra links? pip has --find-links can your instructions simply state to do the equivalent of `pip install --find-links=http://setuptools.com/dev-snapshopts/`? Alternatively I would like to get the tooling smarter about not installing pre-release versions unless asked as well. So with that the answer could simply be to make dev releases to PyPI, (PyPI will probably need some sort of prefer stable option for it's web ui), and have the tooling prefer stable releases. Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. (Not to mention the newer projects that host directly from revision control.) Most of these projects are already running python setup.py register, so for the vast bulk of them they'll just need to add a sdst upload to that. Given that easy_install was deliberately designed so that those guys would *not* need to change their hosting strategies to get automated downloads, I'd like to see more talk about how we're going to help people change their releasing and hosting strategies. Someone has made a comment about making a script to make it easy to make old versions available on PyPI for authors. I believe for most people the change should be fairly easy since they are already registering their releases. However if someone has an odd release process I'd be willing to try and help them fit the new requirement into it. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On 28 February 2013 08:31, PJ Eby p...@telecommunity.com wrote: OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Yup, and the down-side of distutils as the tool for talking to PyPI is, of course, the horrendous turn-around time trying to add features or fix bugs. I've advocated us having the upload/register/whatever functionality in a separate tool for a while, but that doesn't seem to have gained any traction. Of course issues around the complexity introduced by setup.py make it much harder. In the mean time I think Donald's suggestion for supporting development pre-releases is reasonable: instead of (please replace with easy_install lingo here) `pip install setuptools==setuptools-dev` please `pip install -e http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev` ? Richard ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 11:48 PM, Richard Jones rich...@python.org wrote: I've advocated us having the upload/register/whatever functionality in a separate tool for a while, but that doesn't seem to have gained any traction. Of course issues around the complexity introduced by setup.py make it much harder. Well, if we break distutils, we would have to make a separate tool. Is it a problem or is it an opportunity? :-) //Lennart ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 2:31 PM, PJ Eby p...@telecommunity.com wrote: On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, So far, I don't think anybody's talking to the right we for stopping it. It's the tools that control this, not PyPI. (PyPI can't actually stop the tools from using this information without also making itself a lot less useful to *humans* at the same time.) As far as my personal position on the matter, I think that it's reasonable to deprecate the scraping of home page and download links. As somebody pointed out, expired domains are a potentially nasty problem there. OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. (Not to mention the newer projects that host directly from revision control.) As far as I'm concerned, this is all about helping package maintainers. The way pip works now, every time I do a release candidate, pip automatically installs it, even though I only upload it to Google Code. I don't want it to do this, but the only way around it would be either 1. give it some weird name so that pip doesn't think it is newer 2. upload it somewhere else or 3. go in to PyPI and remove all mentions of Google Code from the index. And by the way, this hasn't been mentioned, but I really mean *all* mentions of Google Code on PyPI. pip crawls Google Code not just because Google Code listed as an official site for my package or because the latest release is there, but because a single old release points there. So to get pip to not crawl there, I would have to go through and remove all old mentions of Google Code, even from releases that were made in 2006. So you can see why the expired domain scenario is a very real issue. And combined with the fact that everyone uses pip with sudo that was discussed on this list a while back, you have a hackers dream for installing malicious code on everyone's computers. I also had the issue where pip was trying to install our documentation, because I named it sympy-0.7.1-doc, which it thought was newer than sympy-0.7.1. Again, I only uploaded that file to Google Code, not PyPI. And currently we have the issue where it tries to install the Python 2 tarball in Python 3, which is partially related to all this (it's all part of the gathering metadata from the filename instead of the PyPI classifiers). If we require that people upload files, we can additionally only gather metadata from classifiers. If pip installs Python 2 code in Python 3, the solution isn't to try to trick it by some filename mangling (which won't work in easy_install, but oh well), but rather, just set the classifier for the download like you were supposed to in the first place, and it will just work. With this change if I (the package maintainer) do the right thing, pip does the right thing. The way it is now, if I do the right thing, pip does the wrong thing, and to make pip do the right thing, I have to trick it into do so. So for me at least, the change to the release process is stop wasting my time figuring out how to trick pip, and just do things according to the PyPI classifier API (which I'm already doing anyway, just pip ignores it), and everything will work. Aaron Meurer Given that easy_install was deliberately designed so that those guys would *not* need to change their hosting strategies to get automated downloads, I'd like to see more talk about how we're going to help people change their releasing and hosting strategies. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 4:50 PM, Donald Stufft donald.stu...@gmail.com wrote: Development snapshots are a use case that i'm not sure makes sense for PyPI, but if they do should require specific opt-in to install them. Does easy_install have a command line flag that adds extra links? *chuckle*. Yes, it's the original source of the --find-links option, emulated in pip to ease migration. can your instructions simply state to do the equivalent of `pip install --find-links=http://setuptools.com/dev-snapshopts/`? The problem with find-links is that if you push them off of PyPI, they have to go somewhere else, which is setuptools' dependency-links feature. Now you have an even *harder* problem to update or remove those links, because they're not under the control of the author nor visible on PyPI. Alternatively I would like to get the tooling smarter about not installing pre-release versions unless asked as well. Yes, and that discussion doesn't have much to do with PyPI per se, because again, it's up to the tools. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 7:08 PM, PJ Eby wrote: On Wed, Feb 27, 2013 at 6:16 PM, Aaron Meurer asmeu...@gmail.com (mailto:asmeu...@gmail.com) wrote: As far as I'm concerned, this is all about helping package maintainers. The way pip works now, every time I do a release candidate, pip automatically installs it, even though I only upload it to Google Code. I don't want it to do this, but the only way around it would be either 1. give it some weird name so that pip doesn't think it is newer 2. upload it somewhere else or 3. go in to PyPI and remove all mentions of Google Code from the index. There's also a *fourth* way, which I asked the PyPI developers many years ago to do, which is to stop including download links on the /simple index for hidden (i.e., non-current) releases. (Something I am still in favor of, btw. Jim Fulton argued against it, IIRC, and it ended in a stalemate. However, I don't think we discussed distinguishing PyPI downloads from other downloads, just getting rid of old links in general) Frankly, just dropping /simple links for hidden releases would also fix a good chunk of expired domain, stale releases, too many downloads, etc. In addition, if a project migrates to using PyPI uploads, they will not still be subject to external downloads for older versions being crawled. So, if we must do away with the links, I would suggest that the phases be: 1. Remove homepage/download URLs for hidden versions from the /simple index altogether (leaving PyPI download links available) 2. Remove the rel=... attributes from the remaining download and home page links (this will stop off-site crawling, but not off-site downloading) 3. Re-evaluate whether anything else actually needs to be removed. This seems a bit complicated, people in general don't even know the external link spidering exists, much less understand the intricacies of what types of links get spidered when. A simple After X date no new urls will be added and after Y date all existing urls will be removed removes ambiguity from the process. Having this kind of link will get removed Y and that matters in Z conditions leads to a lot of confusion about what does and doesn't work. Basically, 99% of the complaints here are lumping together all of these different kinds of links -- stale links, spidered links, and plain external download links -- even though they don't create the same sorts of problems. Taking it in stages will give authors time to change processes, while still getting rid of the biggest problem sources right away (stale homepage/download URLs). My complaints is external urls at all, for a myriad of reasons, some specific to particular cases of them, some not. The first of these changes could be done now, though I'd check with Jim about the buildout use case; IIRC it was to allow pinned versions. But if the main use cases also had eggs on PyPI rather than downloading them from elsewhere, then removing *just* the homepage/download links would clean things up nicely, including your runaway Google Code downloads, without needing to change any installer code that's out in the field right now. ___ Catalog-SIG mailing list Catalog-SIG@python.org (mailto:Catalog-SIG@python.org) http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 6:24 PM, Donald Stufft donald.stu...@gmail.com wrote: On Wednesday, February 27, 2013 at 8:13 PM, PJ Eby wrote: On Wed, Feb 27, 2013 at 7:36 PM, Donald Stufft donald.stu...@gmail.com wrote: This seems a bit complicated, people in general don't even know the external link spidering exists, much less understand the intricacies of what types of links get spidered when. A simple After X date no new urls will be added and after Y date all existing urls will be removed removes ambiguity from the process. Having this kind of link will get removed Y and that matters in Z conditions leads to a lot of confusion about what does and doesn't work. AFAICT, that's an argument in *favor* of phased removals, not against. (Also, you have the order backwards from my proposal, which is to *first* remove broken old junk in two phases. This is actually *less* problematic than doing it for new releases first. And of course the simplest thing of all would be to make no change at all.) The phased removals are a problem when people won't understand the differentiating factors between the different phases. Anyway, let's try to be a little bit less like the politicians who, upon being told that Something must be done!, turn around and pick any arbitrary value of something, and do that, so as to be seen to be doing something. But that is *exactly* what is happening now: people are proposing to create worse problems down the line by insisting on doing something right now (although never is often better, per the Zen of Python) without considering the consequences that will happen six months or so from now... when the users and toolmakers move the external links someplace else, that will have even *less* visibility, maintainability, and trust than they have now. This is not something I've just cooked up, It's been thought about since I stood up Crate a year ago, infact there is a /simple/ index on Crate that flat out removes external links (as well as all the breakage that occurs). I'm well aware of the implications here. dependency_links cannot be controlled via PyPI (and infact require a download to even trigger them if they are in setup.py) so that problem is outside of the realm of PyPI. Like I said I've already opened issues with pip/buildout about this, and I have every intention of seeing them through till completion. Can you give the links to the issues in their issue trackers, for those of us who want to follow the progress of this more closely? Aaron Meurer PyPI is one part of the overall remove automatic trolling of links from the index plan. This won't make your problems better, it will actually make them *worse*, for the sake of making what is essentially a political statement about how seriously the Python community values security. (This is especially the case because getting rid of the links won't actually get you to a secure system. The *actual* solution is code signing... which there is already a PEP for. Get the code signing done right, and the external links will be irrelevant.) Code signing only solves some problems, and this isn't just about security, (although it does play a major part) read my previous emails. Furthermore code signing is a larger change *and* it's a lot more difficult to get old releases to go back and sign their releases. This improves the overall security of these old releases even if we are unable to get them signed. Now, I am not saying that something doesn't need to be done, but it needs to be considered more carefully than just, First thing we do, let's kill all the links! A phase-out will not lose anything that isn't already lost. (A parallel from Mercurial, when they added SSL cert verification: the warnings don't mean things are more insecure now, you're just getting informed now of how insecure they *already always were*.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wednesday, February 27, 2013 at 8:34 PM, Aaron Meurer wrote: On Wed, Feb 27, 2013 at 6:24 PM, Donald Stufft donald.stu...@gmail.com (mailto:donald.stu...@gmail.com) wrote: On Wednesday, February 27, 2013 at 8:13 PM, PJ Eby wrote: On Wed, Feb 27, 2013 at 7:36 PM, Donald Stufft donald.stu...@gmail.com (mailto:donald.stu...@gmail.com) wrote: This seems a bit complicated, people in general don't even know the external link spidering exists, much less understand the intricacies of what types of links get spidered when. A simple After X date no new urls will be added and after Y date all existing urls will be removed removes ambiguity from the process. Having this kind of link will get removed Y and that matters in Z conditions leads to a lot of confusion about what does and doesn't work. AFAICT, that's an argument in *favor* of phased removals, not against. (Also, you have the order backwards from my proposal, which is to *first* remove broken old junk in two phases. This is actually *less* problematic than doing it for new releases first. And of course the simplest thing of all would be to make no change at all.) The phased removals are a problem when people won't understand the differentiating factors between the different phases. Anyway, let's try to be a little bit less like the politicians who, upon being told that Something must be done!, turn around and pick any arbitrary value of something, and do that, so as to be seen to be doing something. But that is *exactly* what is happening now: people are proposing to create worse problems down the line by insisting on doing something right now (although never is often better, per the Zen of Python) without considering the consequences that will happen six months or so from now... when the users and toolmakers move the external links someplace else, that will have even *less* visibility, maintainability, and trust than they have now. This is not something I've just cooked up, It's been thought about since I stood up Crate a year ago, infact there is a /simple/ index on Crate that flat out removes external links (as well as all the breakage that occurs). I'm well aware of the implications here. dependency_links cannot be controlled via PyPI (and infact require a download to even trigger them if they are in setup.py) so that problem is outside of the realm of PyPI. Like I said I've already opened issues with pip/buildout about this, and I have every intention of seeing them through till completion. Can you give the links to the issues in their issue trackers, for those of us who want to follow the progress of this more closely? https://github.com/pypa/pip/issues/818 https://github.com/buildout/buildout/issues/92 Aaron Meurer ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
maintainers. The way pip works now, every time I do a release candidate, pip automatically installs it, even though I only upload it an option to exclude pre-releases (or in reverse, an option to allow them) does seem overdue. reasons not to do this? anyone? links to the most relevant conversations/posts from the past? well), but rather, just set the classifier for the download like you were supposed to in the first place, and it will just work. With this change if I (the package maintainer) do the right thing, pip does the right thing. The way it is now, if I do the right thing, pip does the wrong thing it's not clear that trove classifiers is the consensus on how an installer should know about the python version. surfacing requires-python in pypi for installers (when metadata-version =1.2 actually becomes pervasive) seems like the right idea. but maybe an option to look at classifiers in the short term? not sure. Marcus ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 6:27 AM, Donald Stufft donald.stu...@gmail.com wrote: Sometimes you need to break things. The goal is to do it with ample warning and migration time so that people have a chance to move to the new way of doing things. Again, I am not suggesting we delete all external links immediately, just disable new ones. Removing old ones will come later. This thread is long enough that I'm not sure on where to weigh in. Here seems appropriate enough. 1. The next generation metadata infrastructure will NOT support external hosting of files indexed on PyPI - if you don't upload the archive files to PyPI, they won't be included in the next generation metadata. If you want external hosting, you will need to run a separate index (this is similar to the yum model - you can host files wherever you want, but you need to run yum createrepo yourself to generate the metadata, and instruct users on how to get their installers to retrieve your metadata. The big difference between PyPI and the yum model is that the default index still won't be curated at all, so there's no review process to get through if you want to use it, thus less need for external hosting). 2. Near term, with the current generation infrastructure, I think it's better to approach the problem *very* gently. Our political capital with users is low at this point, and we need to prioritise what things we want to make people angry about (whether or not we consider their anger justified is completely irrelevant). This proposal is for a transition that would take months. Since I want to have the next generation metadata up and running within months *anyway*, that means this strikes me as primarily a distraction from fixing the problem properly. 3. Various other problems raised in this thread will only be fixed with next generation metadata that the automated tools can *rely* on rather than having to guess the intended semantics. That's why PEP 426 is now explicit about pre-release handling, and why it makes version specifiers like (for example), Requires-Python: 2.6 exclude Python 3 by default. (although the thread does raise an interesting question of whether or not you can cleanly specify dual Python 2 3 support given the current state of PEP 426) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thursday, February 28, 2013 at 1:39 AM, Nick Coghlan wrote: On Thu, Feb 28, 2013 at 6:27 AM, Donald Stufft donald.stu...@gmail.com (mailto:donald.stu...@gmail.com) wrote: Sometimes you need to break things. The goal is to do it with ample warning and migration time so that people have a chance to move to the new way of doing things. Again, I am not suggesting we delete all external links immediately, just disable new ones. Removing old ones will come later. This thread is long enough that I'm not sure on where to weigh in. Here seems appropriate enough. 1. The next generation metadata infrastructure will NOT support external hosting of files indexed on PyPI - if you don't upload the archive files to PyPI, they won't be included in the next generation metadata. If you want external hosting, you will need to run a separate index (this is similar to the yum model - you can host files wherever you want, but you need to run yum createrepo yourself to generate the metadata, and instruct users on how to get their installers to retrieve your metadata. The big difference between PyPI and the yum model is that the default index still won't be curated at all, so there's no review process to get through if you want to use it, thus less need for external hosting). 2. Near term, with the current generation infrastructure, I think it's better to approach the problem *very* gently. Our political capital with users is low at this point, and we need to prioritise what things we want to make people angry about (whether or not we consider their anger justified is completely irrelevant). This proposal is for a transition that would take months. Since I want to have the next generation metadata up and running within months *anyway*, that means this strikes me as primarily a distraction from fixing the problem properly. I'm glad the next set of Metadata won't have external links, however even if it showed up tomorrow it's going to be a long time until people are completely migrated to it. Furthermore you estimate months but the first phase will have positive benefits right away, namely that it will prompt people to start uploading their packages better increasing the security and reliability of the current system. And finally while I'm glad to see forward movement It's been said before not to bother making a fix to the existing system because X was going to happen soon, in the past i was distutils2/packaging, now it's PEP426/packaging. While I have every hope and I believe it will happen this time, the past has made me worry about holding off on good incremental improvements to the current infra. 3. Various other problems raised in this thread will only be fixed with next generation metadata that the automated tools can *rely* on rather than having to guess the intended semantics. That's why PEP 426 is now explicit about pre-release handling, and why it makes version specifiers like (for example), Requires-Python: 2.6 exclude Python 3 by default. (although the thread does raise an interesting question of whether or not you can cleanly specify dual Python 2 3 support given the current state of PEP 426) Pre release handling doesn't require anything new to handle (https://github.com/pypa/pip/issues/820) requires-python will be that's a separate issue really. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com (mailto:ncogh...@gmail.com) | Brisbane, Australia ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig