Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

Ryan Sleevi via dev-security-policy Thu, 11 Jan 2018 13:36:55 -0800

On Wed, Jan 10, 2018 at 4:33 AM, josh--- via dev-security-policy <
[email protected]> wrote:


> At approximately 5 p.m. Pacific time on January 9, 2018, we received a
> report from Frans Rosén of Detectify outlining a method of exploiting some
> shared hosting infrastructures to obtain certificates for domains he did
> not control, by making use of the ACME TLS-SNI-01 challenge type. We
> quickly confirmed the issue and mitigated it by entirely disabling
> TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for finding
> this issue and reporting it to us.
>
> We’d like to describe the issue and our plans for possibly re-enabling
> TLS-SNI-01 support.
>
> Problem Summary
>
> In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA)
> validates a domain name by generating a random token and communicating it
> to the ACME client. The ACME client uses that token to create a self-signed
> certificate with a specific, invalid hostname (for example,
> 773c7d.13445a.acme.invalid), and configures the web server on the domain
> name being validated to serve that certificate. The ACME server then looks
> up the domain name’s IP address, initiates a TLS connection, and sends the
> specific .acme.invalid hostname in the SNI extension. If the response is a
> self-signed certificate containing that hostname, the ACME client is
> considered to be in control of the domain name, and will be allowed to
> issue certificates for it.
>
> However, Frans noticed that at least two large hosting providers combine
> two properties that together violate the assumptions behind TLS-SNI:
>
> * Many users are hosted on the same IP address, and
> * Users have the ability to upload certificates for arbitrary names
> without proving domain control.
>
> When both are true of a hosting provider, an attack is possible. Suppose
> example.com’s DNS is pointed at the same shared hosting IP address as a
> site controlled by the attacker. The attacker can run an ACME client to get
> a TLS-SNI-01 challenge, then install their .acme.invalid certificate on the
> hosting provider. When the ACME server looks up example.com, it will
> connect to the hosting provider’s IP address and use SNI to request the
> .acme.invalid hostname. The hosting provider will serve the certificate
> uploaded by the attacker. The ACME server will then consider the attacker’s
> ACME client authorized to issue certificates for example.com, and be
> willing to issue a certificate for example.com even though the attacker
> doesn’t actually control it.
>
> This issue only affects domain names that use hosting providers with the
> above combination of properties. It is independent of whether the hosting
> provider itself acts as an ACME client.
>
> Our Plans
>
> Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s
> Encrypt. However, a large number of people and organizations use the
> TLS-SNI-01 challenge type to get certificates. It’s important that we
> restore service if possible, though we will only do so if we’re confident
> that the TLS-SNI-01 challenge type is sufficiently secure.
>
> At this time, we believe that the issue can be addressed by having certain
> services providers implement stronger controls for domains hosted on their
> infrastructure. We have been in touch with the providers we know to be
> affected, and mitigations will start being deployed for their systems
> shortly.
>
> Over the next 48 hours we will be building a list of vulnerable providers
> and their associated IP addresses. Our tentative plan, once the list is
> completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable
> providers blocked from using it.
>
> We’re also going to be soliciting feedback on our plans from our
> community, partners and other PKI stakeholders prior to re-enabling the
> TLS-SNI-01 challenge. There is a lot to consider here and we’re looking
> forward to feedback.
>
> We will post more information and details as our plans progress.
>

(Wearing a Google Chrome hat on behalf of our root store policy)

Josh,

Thanks for bringing this rapidly to the attention of the broader community
and proactively reaching out to root programs.

As framing to the discussion, we still believe TLS-SNI is fully permitted
by the Baseline Requirements, which, while not ideal, still permits
issuance using this method. As such, the 'root' cause is that the Baseline
Requirements permit methods that are less secure than desired, and the
discussion that follows is now around what steps to take - as CAs, as Root
Programs, for site operators, and for the CA/Browser Forum.

When faced with a vulnerable validation method that is permitted, it's
always a challenge to balance the need for security - for sites and users -
with the risk of compatibility and breakage from the removal of such a
method. Fundamentally, the issues you raise call into question the level of
assurance of 3.2.2.4.9 and 3.2.2.4.10 in the Baseline Requirements, and are
not limited to TLS-SNI, and potentially affects every CA using these
methods.

When evaluating these methods, and their risks, compared to, say, the
also-weak 3.2.2.4.1 and 3.2.2.4.5 discussions ongoing with the CA/Browser
Forum, a few key distinctions, although non-exhaustive, apply and are
factored in to our response and proposal here:

- The average lifetime of certificates using these methods, across CAs,
compared to 3.2.2.4.1/3.2.2.4.5, is significantly shorter - very close to
the 90 days that Let's Encrypt uses, based on the available information we
have. The fact that so many of these certificates are short lived creates a
situation in where there's simultaneously more risk to the ecosystem to
rapidly removing these methods as acceptable (due to the need to
obtain/renew certificate), while there's also much less risk in allowing
this method to continue to be used for a limited time, due to the fact that
certificates that could be obtained by exploiting this will expire much
sooner than the 2-3 years that many other certificates are issued with.
That is, the security risk of a bad validation that lives for 3 years is
much greater than the risk of a bad validation that lives for 90 days, and
the fact that the badness is only valid for 90 days means that it's easier
to allow it to more gracefully shut down than potentially accepting that
implied risk for years.

- The ease of which alternative methods exist. Methods that are manual are
substantially easier to remove quickly, as alternative manual processes can
also be used during the human-to-human interaction, while methods that are
highly automated conversely create greater challenges, due to the need to
update client software to whatever new automated methods may be used. While
3.2.2.4.1 and 3.2.2.4.5 are highly human-driven methods, methods like
3.2.2.4.9 and .10 are designed for automation - and why we were supportive
of their addition - but also mean that any mitigations will necessarily
face ecosystem challenges, much like deploying new versions of TLS or
deprecating old ones.

- The ease of which alternative automated methods can be used. As automated
methods are generally designed around integrated systems and certain design
constraints, it's not always possible to move to an equivalently
automatable method (as it is with manual methods), and it may be that no
equivalent automated method exists to fill the design niche. If that design
niche is a substantial one for clients, and enables otherwise unautomatable
systems, it can pose greater risk in prematurely removing it. Specific
applications of the .9 and .10 methods, such as ACME's TLS-SNI, occupy an
important niche, similar to the 3.2.2.4.6 method and ACME's HTTP-01 method,
provide a level of automation for systems not directly integrated with DNS,
and while that means they must be particularly attentive to the security
risks that come from that, done correctly, they can provide a greater path
towards security.

- Compared to 3.2.2.4.1 and 3.2.2.4.5, specific applications of 3.2.2.4.9
and 3.2.2.4.10 can be evaluated against possible mitigations for the risk,
both short- and long-term, and steps in which site operators can take to
affirmatively protect themselves offer better assurances than those that
rely entirely on the CA's good behaviour. As you call out, the specific
risks of TLS-SNI are limited to shared providers (not individual users)
that meet certain conditions, and these shared providers can already take
existing steps to minimize the immediate risk, such as blocking the use of
certificates or SNI negotiations that contain the '.invalid' TLD. While
this is not an ideal long-term solution, by any means, it allows us to
frame both the immediate and specific risks and the ways to reduce that.

For the sake of brevity, I'll end my comparisons there, but hopefully
highlights some of the factors we've considered in our response to your
proposal.

Given the risks identified to 3.2.2.4.9 and 3.2.2.4.10, we think it would
be best for CAs using these Baseline Requirements-acceptable methods of
validation to begin immediately transitioning away from them, with the goal
of either removing them entirely from the Baseline Requirements, or
identifying ways in which .9 and .10 can be better specified to mitigate
such risks. That said, given the potential risks to the ecosystem,
particularly those with pre-existing short-lived certificates, we think
that, provided that the new certificates are valid for 90 days or less,
we're open to allowing the specific TLS-SNI methods identified by the ACME
specification to continue to be used for a limited time, while the broader
community works to identify potential mitigations (if possible) or
transition away from these methods.

While we don't think the current status quo represents a viable long-term
solution, given that the ACME TLS-SNI methods have been broadly reviewed
within the IETF, that the risks apply to a limited subset of specific
infrastructures, that mitigations are possible for these infrastructures to
deploy, that Let's Encrypt is actively working with the community to
identify, and ideally, share, those that haven't or cannot deploy such
mitigations, and all of the other items previously mentioned, we think this
represents an appropriate short-term balance.

If and as new facts become available, it may be necessary to revisit this.
We may have overlooked additional risks, or failed to consider mitigating
factors. Further, this response is contextualized in the application of
ACME's TLS-SNI methods for validation, and such a response may not be
appropriate for other forms of validations within the framework of
3.2.2.4.9 and 3.2.2.4.10. Similarly, this response doesn't apply to
certificates that may be valid for longer periods, as they may present
substantially greater risk to making effective improvements to or an
orderly transition away from these methods.

We look forward to working with other browser vendors, site operators, and
the relying community to work out ways to provide an orderly and effective
transition to more secure methods - whether that means away from the
3.2.2.4.9/.10 series of domain validations, or to more restrictive forms
that are more clearly "opt-in" rather than the explicit "opt-out" proposed
(of 'blacklisting .invalid').

We're also curious if we've overlooked salient details in our response, and
thus welcome feedback from Let's Encrypt, other CAs utilizing these
validation methods (both TLS-SNI and 3.2.2.4.9 and 3.2.2.4.10), and the
broader community as to our proposed next steps. Please consider this a
draft response, and we look forward to future updates regarding proposed
next steps.
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

Reply via email to