[Distutils] Re: Surviving a Compromise of PyPI - PEP 458 and 480

Brett Cannon Tue, 24 Dec 2019 11:12:55 -0800

I'll just say that when I blogged about PyPI security at 
https://snarky.ca/how-do-you-verify-pypi-can-be-trusted/ the idea Nick is 
proposing was along the lines of what I thought about as a solution to the 
non-typosquatting side of the security problem. Which is to mean it should be 
easy enough to explain to non-security folk if I could figure it out. ;)


Nick Coghlan wrote:
> On 1 January 2015 at 05:51, Donald Stufft don...@stufft.io wrote:
> >
> > So here is my problem. I’m completely on board with
> > the developer signing
> > for the distribution files. I think that makes total sense. However I worry
> > that requiring the developer to sign for what is essentially the
> > “installer” API (aka how pip discovers things to install) is going to put
> > us in a situation where we cannot evolve the API easily. If we modified
> > this PEP so that an online key signed for /simple/ what security properties
> > would we lose?
> > It appears to me that the problem then would be that a compromise of
> > PyPI can present whatever information they want to pip as to what is
> > available for pip to download and install. This would mean freeze attacks,
> > mix and match attacks. It would also mean that they could, in a future
> > world where pip can use metadata on PyPI to do dependency resolution, tell
> > pip that it needs to download a valid but malicious project as a dependency
> > of a popular project like virtualenv.
> > However I don’t think they’d be able to actually cause pip to install a
> > malicious copy of a good project and I believe that we can protect against
> > an attacker who poses that key from tricking pip into installing a
> > malicious but valid project as a fake dependency by having pip only use the
> > theoretical future PyPI metadata that lists dependencies as an optimization
> > hint for what it should download and then once it’s actually downloaded a
> > project like virtualenv (which has been validated to be from the real
> > author) peek inside that file and ensure that the metadata inside that
> > matches what PyPI told pip.
> > Is my assessment correct? Is keeping the “API” under control of PyPI a
> > reasonable thing to do while keeping the actual distribution files
> > themselves under control of the distribution authors? The reason this
> > worries me is that unlikely a Linux distribution or an application like
> > Firefox or so we don’t have much of a relationship with the people who are
> > uploading things to PyPI. So if we need to evolve the API we are not going
> > to be able to compel our authors to go back and re-generate new signed
> > metadata.
> > I think this is a good entry point for an idea I've had kicking around in
> my brain for the past couple of days: what if we change the end goal of PEP
> 480 slightly, from "prevent attackers from compromising published PyPI
> metadata" to "allow developers & administrators to rapidly detect and
> recover from compromised PyPI metadata"?
> My reasoning is that when it comes to PyPI security, there are actually two
> major dials we can twiddle:
> 
> raising the cost of an attack (e.g. making compromise harder by
> distributing signing authority to developers)
> reducing the benefit of an attack (e.g. making the expected duration, and
> hence reach, of a compromise lower, or downgrading an artifact substitution
> attack to a denial of service attack)
> 
> To raise the cost of a compromise through distributed signing authority, we
> have to solve the trust management problem - getting developer keys out to
> end users in a way that doesn't involve trusting the central PyPI service.
> That's actually a really difficult problem to solve, which is why we have
> situations like TLS still relying on the CA system, despite the known
> problems with the latter.
> However, the latter objective is potentially more tractable: we wouldn't
> need to distribute trust management out to arbitrary end users, we'd "just"
> need a federated group of entities that are in a position to detect that
> PyPI has potentially been compromised, and request a service shutdown until
> such time as the compromise has been investigated and resolved.
> This notion isn't fully evolved yet (that's why this email is so long), but
> it feels like a far more viable direction to me than the idea of pushing
> the enhanced security management problem back on to end users.
> Suppose, for example, there were additional independently managed
> validation services hosting TUF metadata for various subsets of PyPI. The
> enhanced security model would then involve developers opting in to
> uploading their package metadata to one or more of the validation servers,
> rather than just to the main PyPI server. pip itself wouldn't worry about
> checking the validation services - it would just check against the main
> server as it does today, so we wouldn't need to worry about how we get the
> root keys for the validation servers out to arbitrary client end points.
> That is, rather than "sign your own packages", the enhanced security model
> becomes "get multiple entities to sign your packages, so compromise of any
> one entity (including PyPI itself) can be detected and investigated
> appropriately".
> The validation services would then be responsible for checking that their
> own registered metadata matched the metadata being published on PyPI. If
> they detect a discrepancy between their own metadata and PyPI's, then we'd
> have a human-in-the-loop process for reporting the problem, and the most
> likely response would be to disable PyPI downloads while the situation was
> resolved.
> I believe something like that would change the threat landscape in a
> positive way, and has three very attractive features over distributed
> signing authority:
> 
> It's completely transparent at the point of installation - it transforms
> PEP 480 into a back end data integrity validation project, rather than
> something that affects the end user experience of the PyPI ecosystem. The
> changes to the installation experience would be completely covered by PEP
> 458.
> Uploading metadata to additional servers for signing is relatively low
> impact on developers (if they have an automated release process, it's
> likely just another line in a script somewhere), significantly lowering
> barriers to adoption relative to asking developers to sign their own
> packages.
> Folks that decide to run or use a validation server are likely going to
> be more closely engaged with the PyPI community, and hence easier to reach
> as the metadata requirements evolve
> 
> In terms of how I believe such a change would mitigate the threat of a PyPI
> compromise:
> 
> it provides a cryptographically validated way to detect a compromise of
> any packages registered with one or more validation services, significantly
> reducing the likelihood of a meaningful PyPI compromise going undetected
> in any subsequent investigation, we'd have multiple sets of
> cryptographically validated metadata to compare to identify exactly what
> was compromised, and how it was compromised
> the new attack vectors introduced (by compromising the validation
> services rather than PyPI itself) are denial of service attacks (due to
> PyPI downloads being disabled while the discrepancy is investigated),
> rather than the artifact substitution that is possible by attacking PyPI
> directly
> 
> That means we would move from the status quo, where a full PyPI compromise
> may permit silent substitution of artifacts to one where an illicit online
> package substitution would likely be detected in minutes or hours for high
> profile projects, so the likely pay-off for an attack on the central
> infrastructure is a denial of service against organisations not using their
> own local PyPI mirrors, rather than arbitrary software installation on a
> wide range of systems.
> Another nice benefit of this approach is that it also protects against
> attacks on developer PyPI accounts, so long as they use different
> authentication mechanisms on the validation server over the main PyPI
> server. For example, larger organisations could run their own validation
> server for the packages they publish, and manage it using offline keys as
> recommended by TUF - that's a lot easier to do when you don't need to allow
> arbitrary uploads.
> Specific projects could still be attacked (by compromising developer
> systems), but that's not a new threat, and outside the scope of PEP 458/480
> 
> we're aiming to mitigate the threat of systemic compromise that
> currently makes PyPI a relatively attractive target.
> 
> As far as the pragmatic aspects go, we could either go with a model where
> projects are encouraged to run their own validation services on something
> like OpenShift (or even a static hosting site if they generate their
> validation metadata locally), or else we could look for willing partners to
> host public PyPI metadata validation servers (e.g. the OpenStack
> Foundation, Fedora/Red Hat, perhaps someone from the
> Debian/Ubuntu/Canonical ecosystem, perhaps some of the other commercial
> Python redistributors)
> Regards,
> Nick.
> [1] Via Leigh Alexander, I was recently introduced to this excellent paper
> on understanding and working with the mental threat models that users
> actually have, rather than attempting to educate the users:
> https://cups.cs.cmu.edu/soups/2010/proceedings/a11_Walsh.pdf.
> While the
> paper is specifically written in the context of home PC security, I think
> that's good advice in general: adjusting software systems to accommodate
> the reality of human behaviour is usually going to be far more effective
> than attempting to teach humans to conform to the current needs of the
> software.
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/SEJAQDLMDMLOXE6WQRK3IGOIY6XBJY2F/

[Distutils] Re: Surviving a Compromise of PyPI - PEP 458 and 480

Reply via email to