On Wed, Feb 27, 2013 at 8:26 AM, Donald Stufft <donald.stu...@gmail.com> wrote: > PyPI is now being served with a valid SSL certificate, and the > tooling has begun to incorporate SSL verification of PyPI into > the process. This is _excellent_ and the parties involved should > all be thanked. However there is still another massive area of > insecurity within the packaging tool chain. > > For those who don't know, when you attempt to install a particular > package a number of urls are visited. The steps look roughly > something like this: > > 1. Visit http://pypi.python.org/simple/Package/ and attempt to > collect any links that look like it's installable (tarballs, > #egg=, etc). > Note: /simple/Package/ contains download_url, home_page, > and any link that is contained in the long_description). > 2. Visit any link referenced as home_page and attempt to > collect any links that look like it's installable. > 3. Visit any link referenced in a dependency_links and attempt > to collect any links that look like it's installable. > 4. Take all of the collected links and determine which one > best matches the requirement spec given and download it. > 5. Rinse and repeat for every dependency in the requirement > set. > > I propose we deprecate the external links that PyPI has published > on the /simple/ indexes which exist because of the history of PyPI. > Ideally in some number of months (1? 2?) we would turn off adding > these links from new releases, leaving the existing ones intact and > then a few months later the existing links be removed completely. > > Reasoning: > 1. It is difficult to secure the process of spidering external links > for download. > 1a. The only way I can think offhand is by requiring uploading > a hash of the expected files to PyPI along with the download > link and removing all urls except for the download_url. This > has the effect that only 1 file can be associated with a > particular > release. > 2. External links decrease the expected uptime for a particular set > of requirements. PyPI itself has become very stable, however > the same cannot be said for all of the hosts linked that the toolchain > processes. Each new host is an additional SPOF. > > Ex: I depend on PyPI and 10 other external packages, each > service has a 99% uptime so my expected uptime to > be able to install all my requirements would be ~89% (0.99 ** > 11). > 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide > increased uptime and better latency/throughput across the globe. > 4. Privacy implications, as a user it is not particularly obvious when > I run `pip install Foo` what hosts I will be able issuing requests > against. > It is obvious that I will be contacting PyPI and I will have made the > decision to trust PyPI however it is not obvious what other hosts will > be able to gather information about me, including what packages I am > installing. This becomes even more difficult to determine the deeper > my dependency tree goes.
5. This is a serious PITA for package maintainers. If you accidentally upload a file somewhere else that looks like a newer version pip will install that. 6. It's a huge security hole. For someone to upload a malicious package, they just have to access some site that is crawled by pip, which includes all old download sites. If someone used to use some download domain, but they no longer own it, this is very easy for someone to upload an arbitrary malicious file with a slightly newer version number, and pip will happily install that for everyone. This was discussed at http://mail.python.org/pipermail/catalog-sig/2012-June/004518.html. My suggestion was to only download from the explicit external download link for the latest listed version, and to do so only if an upload didn't exist. At the very least, let package maintainers manually enable this behavior, so that we don't have to worry about tricking pip/easy_install into installing the right thing by version number naming (which is completely broken btw. It's impossible to name separate Python 2 and Python 3 packages so that both pip and easy_install will do the right thing in every case. See https://code.google.com/p/sympy/issues/detail?id=3511). Aaron Meurer _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig