Just an update, asyncmongo has released to PyPI now, so I’ve removed them from the gists as well. Still no word back from PIL.
On May 18, 2014, at 11:21 AM, Donald Stufft <[email protected]> wrote: > > On May 18, 2014, at 2:20 AM, holger krekel <[email protected]> wrote: > >> On Sat, May 17, 2014 at 20:20 -0400, Donald Stufft wrote: >>> On May 17, 2014, at 1:51 PM, holger krekel <[email protected]> wrote: >>> >>>> On Sat, May 17, 2014 at 11:32 -0400, Donald Stufft wrote: >>>>> More conclusions! >>>>> >>>>> In that same time period PyPI received a total of ~16463209 hits to a >>>>> page on >>>>> the simple installer API. This means that in total these projects >>>>> represent >>>>> a combined 0.56% of the simple installer traffic on PyPI. However looking >>>>> at >>>>> the numbers you can see that PIL is an obvious outlier with the hits >>>>> dropping >>>>> drastically after that. PIL on it's own represents 0.44% of the hits on >>>>> PyPI >>>>> during that time period leaving only 0.12% for anything not PIL. >>>> >>>> So the current numbers roughly mean that around 92193 end-user sites per >>>> day depend on crawling currently, right? Do you know if these are also >>>> unique IPs (they might indicate duplicates although companies also have >>>> NATting >>>> firewalls)? >>>> >>>> holger >>> >>> Here’s the number of IP addresses that accessed each /simple/ page per day. >>> >>> https://gist.github.com/dstufft/347112c3bcc91220e4b2 >>> >>> Unique IPs: 95541 >>> Unique IPs for Only Hosted off PyPI: 8248 (8.63%) >>> Unique IPs for Only Hosted off PyPI w/o PIL: 2478 (2.59%) >>> >>> It's important to remember when looking at these numbers that almost all of >>> them represent something downloading a package unsafely which will generally >>> contain Python code that they will then be executed. Breaking the unsafe >>> thing >>> is, in my opinion, non optional and the only thing needed to be discussed >>> about >>> it is how to go about doing it exactly. The safe thing I think *should* be >>> removed for the various other reasons that have been outlined and it only >>> represents a tiny fraction of uses. >>> >>> The numbers to be specific are, 8248 of the above 8248 IPs downloaded >>> something >>> unsafely, while 214 of them also downloaded something safely. That means >>> that >>> 100% of the 8248 addresses could have been attacked through their use of >>> PyPI >>> and only 2.59% downloaded anything that was safely hosted off of PyPI. >>> >>> Looking at the same numbers for projects which have *any* files hosted off >>> of >>> PyPI (the numbers thus far have been projects which have *only* files hosted >>> off of PyPI) I see that 35046 IP addresses accessed a project that had any >>> unsafely hosted off of PyPI files while only 2852 IP addresses accessed a >>> project that had any safely hosted off of PyPI files. >>> >>> That means that roughly a minimum floor of ~36% of the users of PyPI were >>> vulnerable to a MITM attack on 2014-05-14 unless they were using pip 1.5 >>> without any --allow-unverified flags or they were using pip 1.4 with >>> --allow-no-insecure and even in that case they could still be vulnerable if >>> there is any use of setup_requires. I say that's a minimum because that only >>> counts the projects where I happened to find a file hosted unsafely >>> externally. >>> It does not count at all any projects which I did not find a file like that >>> but >>> which still has locations on their simple page like that. This is especially >>> troublesome for projects where they have old domain names in those links >>> that >>> point to domains that are no longer registered. >>> >>> Also just FYI I've removed pyPDF from both lists as I've contacted the >>> author >>> and there are packages now hosted on PyPI for it. I've also contacted PIL >>> and a >>> few other authors (of which I've just heard back from cx_Oracle and they >>> appear >>> to be willing to upload as well). >> >> Thanks Donald for both the numbers and contacting some key authors which >> i think is a very good move! I suggest to now wait a week or so to see >> where we stand then, update the numbers and then try to settle on >> crawl-deprecation paths. >> >> Also, let's please just talk about "checksummed" packages or integrity. >> Even all pypi hosted packages are unsafe in the sense that they >> might contain bad code from malicious uploaders or http-interceptors >> that executes on end-user machines during installation. Thus the term >> "safe" is misleading and should not be used when communicating to >> end-users. Currently, we can only say or improve anything related to >> integrity: what people download is what was uploaded by whoever happened >> to have the credentials (*) or MITM access on http upload. Speaking of the >> latter, maybe we should also think about moving to https uploads and >> certificate-pinning, and that also for installers. And also, as Marius >> pointed out, pypi is currently using the relatively weak MD5 hash. > > The problem with upload is when people use setup.py upload they are often > times > using the upload from distutils. Since that is in the standard library we > can't > really go backwards in time and make it safe. People who use my twine utility > to upload instead of setup.py upload are not vulnerable to MITM on upload. > > While I don't particularly like the MD5 hash, it's not true that the MD5 hash > current presents a problem against the threat model that we're worried about. > It's relatively easy to generate a collision attack, which would mean that a > malicious author could generate two packages, an unsafe and a safe one that > hashed to the same thing. However MD5 is still resistant to 2nd preimage > attacks so an attacker could not create a package that hashes to a given hash. > >> >> Without resolving these issues we can not even truthfully declare >> integrity as something that the pypi-hosted packages themselves are >> providing. > > We cannot fix every problem at once. Right now the tools exist for authors to > make it possible to do everything safely. The externally hosted files > represent > an easier to exploit attack than a MITM on author upload. The MITM requires a > privileged network position on specific individuals whom are also not using > twine or the browser to upload their distributions. > > Attacking people who are installing these packages is far easier. It would > either require a privileged network position on one of ~90k IP addresses on > any > particular day (a much easier feat than for authors periodically) or, even > easier, locate an expired domain registration and simply register the domain > which wouldn't require a privileged network position at all. > >> >> best, >> holger >> >> (*) did you happen to have run some password crackers against >> the pypi database? Might be a larger attack vector than highjacking >> DNS entries. > > No I have not. The database currently uses bcrypt with a work factor of 12 > which makes it computationally hard for me to brute force passwords for all > ~30k users which have a password set. If there was a specific user I was > interested in a smart brute force attack might be able to locate something. > Rate-limiting log in attempts is also on the list of things to add in > Warehouse. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > _______________________________________________ > Distutils-SIG maillist - [email protected] > https://mail.python.org/mailman/listinfo/distutils-sig ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
