PyPI is now being served with a valid SSL certificate, and the
tooling has begun to incorporate SSL verification of PyPI into
the process. This is _excellent_ and the parties involved should
all be thanked. However there is still another massive area of
insecurity within the packaging tool chain.

For those who don't know, when you attempt to install a particular
package a number of urls are visited. The steps look roughly
something like this:

    1. Visit and attempt to
        collect any links that look like it's installable (tarballs,
        #egg=, etc).
        Note: /simple/Package/ contains download_url, home_page,
        and any link that is contained in the long_description).
    2. Visit any link referenced as home_page and attempt to
        collect any links that look like it's installable.
    3. Visit any link referenced in a dependency_links and attempt
        to collect any links that look like it's installable.
    4. Take all of the collected links and determine which one
        best matches the requirement spec given and download it.
    5. Rinse and repeat for every dependency in the requirement

I propose we deprecate the external links that PyPI has published
on the /simple/ indexes which exist because of the history of PyPI.
Ideally in some number of months (1? 2?) we would turn off adding
these links from new releases, leaving the existing ones intact and
then a few months later the existing links be removed completely.

  1. It is difficult to secure the process of spidering external links
    for download.
    1a. The only way I can think offhand is by requiring uploading
          a hash of the expected files to PyPI along with the download
          link and removing all urls except for the download_url. This
          has the effect that only 1 file can be associated with a particular
  2. External links decrease the expected uptime for a particular set
      of requirements. PyPI itself has become very stable, however
      the same cannot be said for all of the hosts linked that the toolchain
      processes. Each new host is an additional SPOF.

      Ex: I depend on PyPI and 10 other external packages, each
            service has a 99% uptime so my expected uptime to
            be able to install all my requirements would be ~89% (0.99 ** 11).
  3. Breaks the ability for a CDN and/or mirroring infrastructure to provide
      increased uptime and better latency/throughput across the globe.
  4. Privacy implications, as a user it is not particularly obvious when
      I run `pip install Foo` what hosts I will be able issuing requests 
      It is obvious that I will be contacting PyPI and I will have made the
      decision to trust PyPI however it is not obvious what other hosts will
      be able to gather information about me, including what packages I am
      installing. This becomes even more difficult to determine the deeper
      my dependency tree goes.

Catalog-SIG mailing list

Reply via email to