I'll just say that when I blogged about PyPI security at https://snarky.ca/how-do-you-verify-pypi-can-be-trusted/ the idea Nick is proposing was along the lines of what I thought about as a solution to the non-typosquatting side of the security problem. Which is to mean it should be easy enough to explain to non-security folk if I could figure it out. ;)
Nick Coghlan wrote: > On 1 January 2015 at 05:51, Donald Stufft don...@stufft.io wrote: > > > > So here is my problem. I’m completely on board with > > the developer signing > > for the distribution files. I think that makes total sense. However I worry > > that requiring the developer to sign for what is essentially the > > “installer” API (aka how pip discovers things to install) is going to put > > us in a situation where we cannot evolve the API easily. If we modified > > this PEP so that an online key signed for /simple/ what security properties > > would we lose? > > It appears to me that the problem then would be that a compromise of > > PyPI can present whatever information they want to pip as to what is > > available for pip to download and install. This would mean freeze attacks, > > mix and match attacks. It would also mean that they could, in a future > > world where pip can use metadata on PyPI to do dependency resolution, tell > > pip that it needs to download a valid but malicious project as a dependency > > of a popular project like virtualenv. > > However I don’t think they’d be able to actually cause pip to install a > > malicious copy of a good project and I believe that we can protect against > > an attacker who poses that key from tricking pip into installing a > > malicious but valid project as a fake dependency by having pip only use the > > theoretical future PyPI metadata that lists dependencies as an optimization > > hint for what it should download and then once it’s actually downloaded a > > project like virtualenv (which has been validated to be from the real > > author) peek inside that file and ensure that the metadata inside that > > matches what PyPI told pip. > > Is my assessment correct? Is keeping the “API” under control of PyPI a > > reasonable thing to do while keeping the actual distribution files > > themselves under control of the distribution authors? The reason this > > worries me is that unlikely a Linux distribution or an application like > > Firefox or so we don’t have much of a relationship with the people who are > > uploading things to PyPI. So if we need to evolve the API we are not going > > to be able to compel our authors to go back and re-generate new signed > > metadata. > > I think this is a good entry point for an idea I've had kicking around in > my brain for the past couple of days: what if we change the end goal of PEP > 480 slightly, from "prevent attackers from compromising published PyPI > metadata" to "allow developers & administrators to rapidly detect and > recover from compromised PyPI metadata"? > My reasoning is that when it comes to PyPI security, there are actually two > major dials we can twiddle: > > raising the cost of an attack (e.g. making compromise harder by > distributing signing authority to developers) > reducing the benefit of an attack (e.g. making the expected duration, and > hence reach, of a compromise lower, or downgrading an artifact substitution > attack to a denial of service attack) > > To raise the cost of a compromise through distributed signing authority, we > have to solve the trust management problem - getting developer keys out to > end users in a way that doesn't involve trusting the central PyPI service. > That's actually a really difficult problem to solve, which is why we have > situations like TLS still relying on the CA system, despite the known > problems with the latter. > However, the latter objective is potentially more tractable: we wouldn't > need to distribute trust management out to arbitrary end users, we'd "just" > need a federated group of entities that are in a position to detect that > PyPI has potentially been compromised, and request a service shutdown until > such time as the compromise has been investigated and resolved. > This notion isn't fully evolved yet (that's why this email is so long), but > it feels like a far more viable direction to me than the idea of pushing > the enhanced security management problem back on to end users. > Suppose, for example, there were additional independently managed > validation services hosting TUF metadata for various subsets of PyPI. The > enhanced security model would then involve developers opting in to > uploading their package metadata to one or more of the validation servers, > rather than just to the main PyPI server. pip itself wouldn't worry about > checking the validation services - it would just check against the main > server as it does today, so we wouldn't need to worry about how we get the > root keys for the validation servers out to arbitrary client end points. > That is, rather than "sign your own packages", the enhanced security model > becomes "get multiple entities to sign your packages, so compromise of any > one entity (including PyPI itself) can be detected and investigated > appropriately". > The validation services would then be responsible for checking that their > own registered metadata matched the metadata being published on PyPI. If > they detect a discrepancy between their own metadata and PyPI's, then we'd > have a human-in-the-loop process for reporting the problem, and the most > likely response would be to disable PyPI downloads while the situation was > resolved. > I believe something like that would change the threat landscape in a > positive way, and has three very attractive features over distributed > signing authority: > > It's completely transparent at the point of installation - it transforms > PEP 480 into a back end data integrity validation project, rather than > something that affects the end user experience of the PyPI ecosystem. The > changes to the installation experience would be completely covered by PEP > 458. > Uploading metadata to additional servers for signing is relatively low > impact on developers (if they have an automated release process, it's > likely just another line in a script somewhere), significantly lowering > barriers to adoption relative to asking developers to sign their own > packages. > Folks that decide to run or use a validation server are likely going to > be more closely engaged with the PyPI community, and hence easier to reach > as the metadata requirements evolve > > In terms of how I believe such a change would mitigate the threat of a PyPI > compromise: > > it provides a cryptographically validated way to detect a compromise of > any packages registered with one or more validation services, significantly > reducing the likelihood of a meaningful PyPI compromise going undetected > in any subsequent investigation, we'd have multiple sets of > cryptographically validated metadata to compare to identify exactly what > was compromised, and how it was compromised > the new attack vectors introduced (by compromising the validation > services rather than PyPI itself) are denial of service attacks (due to > PyPI downloads being disabled while the discrepancy is investigated), > rather than the artifact substitution that is possible by attacking PyPI > directly > > That means we would move from the status quo, where a full PyPI compromise > may permit silent substitution of artifacts to one where an illicit online > package substitution would likely be detected in minutes or hours for high > profile projects, so the likely pay-off for an attack on the central > infrastructure is a denial of service against organisations not using their > own local PyPI mirrors, rather than arbitrary software installation on a > wide range of systems. > Another nice benefit of this approach is that it also protects against > attacks on developer PyPI accounts, so long as they use different > authentication mechanisms on the validation server over the main PyPI > server. For example, larger organisations could run their own validation > server for the packages they publish, and manage it using offline keys as > recommended by TUF - that's a lot easier to do when you don't need to allow > arbitrary uploads. > Specific projects could still be attacked (by compromising developer > systems), but that's not a new threat, and outside the scope of PEP 458/480 > > we're aiming to mitigate the threat of systemic compromise that > currently makes PyPI a relatively attractive target. > > As far as the pragmatic aspects go, we could either go with a model where > projects are encouraged to run their own validation services on something > like OpenShift (or even a static hosting site if they generate their > validation metadata locally), or else we could look for willing partners to > host public PyPI metadata validation servers (e.g. the OpenStack > Foundation, Fedora/Red Hat, perhaps someone from the > Debian/Ubuntu/Canonical ecosystem, perhaps some of the other commercial > Python redistributors) > Regards, > Nick. > [1] Via Leigh Alexander, I was recently introduced to this excellent paper > on understanding and working with the mental threat models that users > actually have, rather than attempting to educate the users: > https://cups.cs.cmu.edu/soups/2010/proceedings/a11_Walsh.pdf. > While the > paper is specifically written in the context of home PC security, I think > that's good advice in general: adjusting software systems to accommodate > the reality of human behaviour is usually going to be far more effective > than attempting to teach humans to conform to the current needs of the > software. -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-le...@python.org https://mail.python.org/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/archives/list/distutils-sig@python.org/message/SEJAQDLMDMLOXE6WQRK3IGOIY6XBJY2F/