On May 17, 2014, at 1:51 PM, holger krekel <[email protected]> wrote:
> On Sat, May 17, 2014 at 11:32 -0400, Donald Stufft wrote: >> More conclusions! >> >> In that same time period PyPI received a total of ~16463209 hits to a page on >> the simple installer API. This means that in total these projects represent >> a combined 0.56% of the simple installer traffic on PyPI. However looking at >> the numbers you can see that PIL is an obvious outlier with the hits dropping >> drastically after that. PIL on it's own represents 0.44% of the hits on PyPI >> during that time period leaving only 0.12% for anything not PIL. > > So the current numbers roughly mean that around 92193 end-user sites per > day depend on crawling currently, right? Do you know if these are also > unique IPs (they might indicate duplicates although companies also have > NATting > firewalls)? > > holger Here’s the number of IP addresses that accessed each /simple/ page per day. https://gist.github.com/dstufft/347112c3bcc91220e4b2 Unique IPs: 95541 Unique IPs for Only Hosted off PyPI: 8248 (8.63%) Unique IPs for Only Hosted off PyPI w/o PIL: 2478 (2.59%) It's important to remember when looking at these numbers that almost all of them represent something downloading a package unsafely which will generally contain Python code that they will then be executed. Breaking the unsafe thing is, in my opinion, non optional and the only thing needed to be discussed about it is how to go about doing it exactly. The safe thing I think *should* be removed for the various other reasons that have been outlined and it only represents a tiny fraction of uses. The numbers to be specific are, 8248 of the above 8248 IPs downloaded something unsafely, while 214 of them also downloaded something safely. That means that 100% of the 8248 addresses could have been attacked through their use of PyPI and only 2.59% downloaded anything that was safely hosted off of PyPI. Looking at the same numbers for projects which have *any* files hosted off of PyPI (the numbers thus far have been projects which have *only* files hosted off of PyPI) I see that 35046 IP addresses accessed a project that had any unsafely hosted off of PyPI files while only 2852 IP addresses accessed a project that had any safely hosted off of PyPI files. That means that roughly a minimum floor of ~36% of the users of PyPI were vulnerable to a MITM attack on 2014-05-14 unless they were using pip 1.5 without any --allow-unverified flags or they were using pip 1.4 with --allow-no-insecure and even in that case they could still be vulnerable if there is any use of setup_requires. I say that's a minimum because that only counts the projects where I happened to find a file hosted unsafely externally. It does not count at all any projects which I did not find a file like that but which still has locations on their simple page like that. This is especially troublesome for projects where they have old domain names in those links that point to domains that are no longer registered. Also just FYI I've removed pyPDF from both lists as I've contacted the author and there are packages now hosted on PyPI for it. I've also contacted PIL and a few other authors (of which I've just heard back from cx_Oracle and they appear to be willing to upload as well). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
