On Wednesday, February 27, 2013 at 10:26 AM, Donald Stufft wrote:
> PyPI is now being served with a valid SSL certificate, and the > tooling has begun to incorporate SSL verification of PyPI into > the process. This is _excellent_ and the parties involved should > all be thanked. However there is still another massive area of > insecurity within the packaging tool chain. > > For those who don't know, when you attempt to install a particular > package a number of urls are visited. The steps look roughly > something like this: > > 1. Visit http://pypi.python.org/simple/Package/ and attempt to > collect any links that look like it's installable (tarballs, > #egg=, etc). > Note: /simple/Package/ contains download_url, home_page, > and any link that is contained in the long_description). > 2. Visit any link referenced as home_page and attempt to > collect any links that look like it's installable. > 3. Visit any link referenced in a dependency_links and attempt > to collect any links that look like it's installable. > 4. Take all of the collected links and determine which one > best matches the requirement spec given and download it. > 5. Rinse and repeat for every dependency in the requirement > set. > > I propose we deprecate the external links that PyPI has published > on the /simple/ indexes which exist because of the history of PyPI. > Ideally in some number of months (1? 2?) we would turn off adding > these links from new releases, leaving the existing ones intact and > then a few months later the existing links be removed completely. > > Reasoning: > 1. It is difficult to secure the process of spidering external links > for download. > 1a. The only way I can think offhand is by requiring uploading > a hash of the expected files to PyPI along with the download > link and removing all urls except for the download_url. This > has the effect that only 1 file can be associated with a particular > release. > 2. External links decrease the expected uptime for a particular set > of requirements. PyPI itself has become very stable, however > the same cannot be said for all of the hosts linked that the toolchain > processes. Each new host is an additional SPOF. > > Ex: I depend on PyPI and 10 other external packages, each > service has a 99% uptime so my expected uptime to > be able to install all my requirements would be ~89% (0.99 ** 11). > 3. Breaks the ability for a CDN and/or mirroring infrastructure to provide > increased uptime and better latency/throughput across the globe. > 4. Privacy implications, as a user it is not particularly obvious when > I run `pip install Foo` what hosts I will be able issuing requests against. > It is obvious that I will be contacting PyPI and I will have made the > decision to trust PyPI however it is not obvious what other hosts will > be able to gather information about me, including what packages I am > installing. This becomes even more difficult to determine the deeper > my dependency tree goes. I fully support this. As an aside, if CDN/storage concerns are an issue, I have an outstanding offer from a large hosting company to take care of the CDN aspects for us. Jesse _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig