Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

josh--- via dev-security-policy Fri, 12 Jan 2018 19:39:07 -0800

On Thursday, January 11, 2018 at 4:29:09 PM UTC-6, [email protected] wrote:
> On Thursday, January 11, 2018 at 3:36:50 PM UTC-6, Ryan Sleevi wrote:
> > On Wed, Jan 10, 2018 at 4:33 AM, josh--- via dev-security-policy <
> > [email protected]> wrote:
> > 
> > > At approximately 5 p.m. Pacific time on January 9, 2018, we received a
> > > report from Frans Rosén of Detectify outlining a method of exploiting some
> > > shared hosting infrastructures to obtain certificates for domains he did
> > > not control, by making use of the ACME TLS-SNI-01 challenge type. We
> > > quickly confirmed the issue and mitigated it by entirely disabling
> > > TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for 
> > > finding
> > > this issue and reporting it to us.
> > >
> > > We’d like to describe the issue and our plans for possibly re-enabling
> > > TLS-SNI-01 support.
> > >
> > > Problem Summary
> > >
> > > In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA)
> > > validates a domain name by generating a random token and communicating it
> > > to the ACME client. The ACME client uses that token to create a 
> > > self-signed
> > > certificate with a specific, invalid hostname (for example,
> > > 773c7d.13445a.acme.invalid), and configures the web server on the domain
> > > name being validated to serve that certificate. The ACME server then looks
> > > up the domain name’s IP address, initiates a TLS connection, and sends the
> > > specific .acme.invalid hostname in the SNI extension. If the response is a
> > > self-signed certificate containing that hostname, the ACME client is
> > > considered to be in control of the domain name, and will be allowed to
> > > issue certificates for it.
> > >
> > > However, Frans noticed that at least two large hosting providers combine
> > > two properties that together violate the assumptions behind TLS-SNI:
> > >
> > > * Many users are hosted on the same IP address, and
> > > * Users have the ability to upload certificates for arbitrary names
> > > without proving domain control.
> > >
> > > When both are true of a hosting provider, an attack is possible. Suppose
> > > example.com’s DNS is pointed at the same shared hosting IP address as a
> > > site controlled by the attacker. The attacker can run an ACME client to 
> > > get
> > > a TLS-SNI-01 challenge, then install their .acme.invalid certificate on 
> > > the
> > > hosting provider. When the ACME server looks up example.com, it will
> > > connect to the hosting provider’s IP address and use SNI to request the
> > > .acme.invalid hostname. The hosting provider will serve the certificate
> > > uploaded by the attacker. The ACME server will then consider the 
> > > attacker’s
> > > ACME client authorized to issue certificates for example.com, and be
> > > willing to issue a certificate for example.com even though the attacker
> > > doesn’t actually control it.
> > >
> > > This issue only affects domain names that use hosting providers with the
> > > above combination of properties. It is independent of whether the hosting
> > > provider itself acts as an ACME client.
> > >
> > > Our Plans
> > >
> > > Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s
> > > Encrypt. However, a large number of people and organizations use the
> > > TLS-SNI-01 challenge type to get certificates. It’s important that we
> > > restore service if possible, though we will only do so if we’re confident
> > > that the TLS-SNI-01 challenge type is sufficiently secure.
> > >
> > > At this time, we believe that the issue can be addressed by having certain
> > > services providers implement stronger controls for domains hosted on their
> > > infrastructure. We have been in touch with the providers we know to be
> > > affected, and mitigations will start being deployed for their systems
> > > shortly.
> > >
> > > Over the next 48 hours we will be building a list of vulnerable providers
> > > and their associated IP addresses. Our tentative plan, once the list is
> > > completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable
> > > providers blocked from using it.
> > >
> > > We’re also going to be soliciting feedback on our plans from our
> > > community, partners and other PKI stakeholders prior to re-enabling the
> > > TLS-SNI-01 challenge. There is a lot to consider here and we’re looking
> > > forward to feedback.
> > >
> > > We will post more information and details as our plans progress.
> > >
> > 
> > (Wearing a Google Chrome hat on behalf of our root store policy)
> > 
> > Josh,
> > 
> > Thanks for bringing this rapidly to the attention of the broader community
> > and proactively reaching out to root programs.
> > 
> > As framing to the discussion, we still believe TLS-SNI is fully permitted
> > by the Baseline Requirements, which, while not ideal, still permits
> > issuance using this method. As such, the 'root' cause is that the Baseline
> > Requirements permit methods that are less secure than desired, and the
> > discussion that follows is now around what steps to take - as CAs, as Root
> > Programs, for site operators, and for the CA/Browser Forum.
> > 
> > When faced with a vulnerable validation method that is permitted, it's
> > always a challenge to balance the need for security - for sites and users -
> > with the risk of compatibility and breakage from the removal of such a
> > method. Fundamentally, the issues you raise call into question the level of
> > assurance of 3.2.2.4.9 and 3.2.2.4.10 in the Baseline Requirements, and are
> > not limited to TLS-SNI, and potentially affects every CA using these
> > methods.
> > 
> > When evaluating these methods, and their risks, compared to, say, the
> > also-weak 3.2.2.4.1 and 3.2.2.4.5 discussions ongoing with the CA/Browser
> > Forum, a few key distinctions, although non-exhaustive, apply and are
> > factored in to our response and proposal here:
> > 
> > - The average lifetime of certificates using these methods, across CAs,
> > compared to 3.2.2.4.1/3.2.2.4.5, is significantly shorter - very close to
> > the 90 days that Let's Encrypt uses, based on the available information we
> > have. The fact that so many of these certificates are short lived creates a
> > situation in where there's simultaneously more risk to the ecosystem to
> > rapidly removing these methods as acceptable (due to the need to
> > obtain/renew certificate), while there's also much less risk in allowing
> > this method to continue to be used for a limited time, due to the fact that
> > certificates that could be obtained by exploiting this will expire much
> > sooner than the 2-3 years that many other certificates are issued with.
> > That is, the security risk of a bad validation that lives for 3 years is
> > much greater than the risk of a bad validation that lives for 90 days, and
> > the fact that the badness is only valid for 90 days means that it's easier
> > to allow it to more gracefully shut down than potentially accepting that
> > implied risk for years.
> > 
> > - The ease of which alternative methods exist. Methods that are manual are
> > substantially easier to remove quickly, as alternative manual processes can
> > also be used during the human-to-human interaction, while methods that are
> > highly automated conversely create greater challenges, due to the need to
> > update client software to whatever new automated methods may be used. While
> > 3.2.2.4.1 and 3.2.2.4.5 are highly human-driven methods, methods like
> > 3.2.2.4.9 and .10 are designed for automation - and why we were supportive
> > of their addition - but also mean that any mitigations will necessarily
> > face ecosystem challenges, much like deploying new versions of TLS or
> > deprecating old ones.
> > 
> > - The ease of which alternative automated methods can be used. As automated
> > methods are generally designed around integrated systems and certain design
> > constraints, it's not always possible to move to an equivalently
> > automatable method (as it is with manual methods), and it may be that no
> > equivalent automated method exists to fill the design niche. If that design
> > niche is a substantial one for clients, and enables otherwise unautomatable
> > systems, it can pose greater risk in prematurely removing it. Specific
> > applications of the .9 and .10 methods, such as ACME's TLS-SNI, occupy an
> > important niche, similar to the 3.2.2.4.6 method and ACME's HTTP-01 method,
> > provide a level of automation for systems not directly integrated with DNS,
> > and while that means they must be particularly attentive to the security
> > risks that come from that, done correctly, they can provide a greater path
> > towards security.
> > 
> > - Compared to 3.2.2.4.1 and 3.2.2.4.5, specific applications of 3.2.2.4.9
> > and 3.2.2.4.10 can be evaluated against possible mitigations for the risk,
> > both short- and long-term, and steps in which site operators can take to
> > affirmatively protect themselves offer better assurances than those that
> > rely entirely on the CA's good behaviour. As you call out, the specific
> > risks of TLS-SNI are limited to shared providers (not individual users)
> > that meet certain conditions, and these shared providers can already take
> > existing steps to minimize the immediate risk, such as blocking the use of
> > certificates or SNI negotiations that contain the '.invalid' TLD. While
> > this is not an ideal long-term solution, by any means, it allows us to
> > frame both the immediate and specific risks and the ways to reduce that.
> > 
> > For the sake of brevity, I'll end my comparisons there, but hopefully
> > highlights some of the factors we've considered in our response to your
> > proposal.
> > 
> > Given the risks identified to 3.2.2.4.9 and 3.2.2.4.10, we think it would
> > be best for CAs using these Baseline Requirements-acceptable methods of
> > validation to begin immediately transitioning away from them, with the goal
> > of either removing them entirely from the Baseline Requirements, or
> > identifying ways in which .9 and .10 can be better specified to mitigate
> > such risks. That said, given the potential risks to the ecosystem,
> > particularly those with pre-existing short-lived certificates, we think
> > that, provided that the new certificates are valid for 90 days or less,
> > we're open to allowing the specific TLS-SNI methods identified by the ACME
> > specification to continue to be used for a limited time, while the broader
> > community works to identify potential mitigations (if possible) or
> > transition away from these methods.
> > 
> > While we don't think the current status quo represents a viable long-term
> > solution, given that the ACME TLS-SNI methods have been broadly reviewed
> > within the IETF, that the risks apply to a limited subset of specific
> > infrastructures, that mitigations are possible for these infrastructures to
> > deploy, that Let's Encrypt is actively working with the community to
> > identify, and ideally, share, those that haven't or cannot deploy such
> > mitigations, and all of the other items previously mentioned, we think this
> > represents an appropriate short-term balance.
> > 
> > If and as new facts become available, it may be necessary to revisit this.
> > We may have overlooked additional risks, or failed to consider mitigating
> > factors. Further, this response is contextualized in the application of
> > ACME's TLS-SNI methods for validation, and such a response may not be
> > appropriate for other forms of validations within the framework of
> > 3.2.2.4.9 and 3.2.2.4.10. Similarly, this response doesn't apply to
> > certificates that may be valid for longer periods, as they may present
> > substantially greater risk to making effective improvements to or an
> > orderly transition away from these methods.
> > 
> > We look forward to working with other browser vendors, site operators, and
> > the relying community to work out ways to provide an orderly and effective
> > transition to more secure methods - whether that means away from the
> > 3.2.2.4.9/.10 series of domain validations, or to more restrictive forms
> > that are more clearly "opt-in" rather than the explicit "opt-out" proposed
> > (of 'blacklisting .invalid').
> > 
> > We're also curious if we've overlooked salient details in our response, and
> > thus welcome feedback from Let's Encrypt, other CAs utilizing these
> > validation methods (both TLS-SNI and 3.2.2.4.9 and 3.2.2.4.10), and the
> > broader community as to our proposed next steps. Please consider this a
> > draft response, and we look forward to future updates regarding proposed
> > next steps.
> 
> We have published an update on our plans for TLS-SNI:
> 
> https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188
> 
> The short summary is that we do not plan to generally re-enable TLS-SNI 
> validation, but we will introduce various forms of whitelists to limit impact 
> during our transition away from TLS-SNI.
> 
> Thanks to everyone for the feedback on this thread already. Let us know if 
> you have any questions or concerns.


Another update, the main thing being that we have deployed patches to our CA 
that allow TLS-SNI for both renewal and whitelisted accounts, as we said we 
would in our previous update:

https://community.letsencrypt.org/t/tls-sni-challenges-disabled-for-most-new-issuance/50316
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

Reply via email to