Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

josh--- via dev-security-policy Fri, 12 Jan 2018 20:14:45 -0800

On Friday, January 12, 2018 at 9:38:42 PM UTC-6, jo...@letsencrypt.org wrote:
> On Thursday, January 11, 2018 at 4:29:09 PM UTC-6, jo...@letsencrypt.org 
> wrote:
> > On Thursday, January 11, 2018 at 3:36:50 PM UTC-6, Ryan Sleevi wrote:
> > > On Wed, Jan 10, 2018 at 4:33 AM, josh--- via dev-security-policy <
> > > dev-security-policy@lists.mozilla.org> wrote:
> > > 
> > > > At approximately 5 p.m. Pacific time on January 9, 2018, we received a
> > > > report from Frans Rosén of Detectify outlining a method of exploiting 
> > > > some
> > > > shared hosting infrastructures to obtain certificates for domains he did
> > > > not control, by making use of the ACME TLS-SNI-01 challenge type. We
> > > > quickly confirmed the issue and mitigated it by entirely disabling
> > > > TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for 
> > > > finding
> > > > this issue and reporting it to us.
> > > >
> > > > We’d like to describe the issue and our plans for possibly re-enabling
> > > > TLS-SNI-01 support.
> > > >
> > > > Problem Summary
> > > >
> > > > In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA)
> > > > validates a domain name by generating a random token and communicating 
> > > > it
> > > > to the ACME client. The ACME client uses that token to create a 
> > > > self-signed
> > > > certificate with a specific, invalid hostname (for example,
> > > > 773c7d.13445a.acme.invalid), and configures the web server on the domain
> > > > name being validated to serve that certificate. The ACME server then 
> > > > looks
> > > > up the domain name’s IP address, initiates a TLS connection, and sends 
> > > > the
> > > > specific .acme.invalid hostname in the SNI extension. If the response 
> > > > is a
> > > > self-signed certificate containing that hostname, the ACME client is
> > > > considered to be in control of the domain name, and will be allowed to
> > > > issue certificates for it.
> > > >
> > > > However, Frans noticed that at least two large hosting providers combine
> > > > two properties that together violate the assumptions behind TLS-SNI:
> > > >
> > > > * Many users are hosted on the same IP address, and
> > > > * Users have the ability to upload certificates for arbitrary names
> > > > without proving domain control.
> > > >
> > > > When both are true of a hosting provider, an attack is possible. Suppose
> > > > example.com’s DNS is pointed at the same shared hosting IP address as a
> > > > site controlled by the attacker. The attacker can run an ACME client to 
> > > > get
> > > > a TLS-SNI-01 challenge, then install their .acme.invalid certificate on 
> > > > the
> > > > hosting provider. When the ACME server looks up example.com, it will
> > > > connect to the hosting provider’s IP address and use SNI to request the
> > > > .acme.invalid hostname. The hosting provider will serve the certificate
> > > > uploaded by the attacker. The ACME server will then consider the 
> > > > attacker’s
> > > > ACME client authorized to issue certificates for example.com, and be
> > > > willing to issue a certificate for example.com even though the attacker
> > > > doesn’t actually control it.
> > > >
> > > > This issue only affects domain names that use hosting providers with the
> > > > above combination of properties. It is independent of whether the 
> > > > hosting
> > > > provider itself acts as an ACME client.
> > > >
> > > > Our Plans
> > > >
> > > > Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s
> > > > Encrypt. However, a large number of people and organizations use the
> > > > TLS-SNI-01 challenge type to get certificates. It’s important that we
> > > > restore service if possible, though we will only do so if we’re 
> > > > confident
> > > > that the TLS-SNI-01 challenge type is sufficiently secure.
> > > >
> > > > At this time, we believe that the issue can be addressed by having 
> > > > certain
> > > > services providers implement stronger controls for domains hosted on 
> > > > their
> > > > infrastructure. We have been in touch with the providers we know to be
> > > > affected, and mitigations will start being deployed for their systems
> > > > shortly.
> > > >
> > > > Over the next 48 hours we will be building a list of vulnerable 
> > > > providers
> > > > and their associated IP addresses. Our tentative plan, once the list is
> > > > completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable
> > > > providers blocked from using it.
> > > >
> > > > We’re also going to be soliciting feedback on our plans from our
> > > > community, partners and other PKI stakeholders prior to re-enabling the
> > > > TLS-SNI-01 challenge. There is a lot to consider here and we’re looking
> > > > forward to feedback.
> > > >
> > > > We will post more information and details as our plans progress.
> > > >
> > > 
> > > (Wearing a Google Chrome hat on behalf of our root store policy)
> > > 
> > > Josh,
> > > 
> > > Thanks for bringing this rapidly to the attention of the broader community
> > > and proactively reaching out to root programs.
> > > 
> > > As framing to the discussion, we still believe TLS-SNI is fully permitted
> > > by the Baseline Requirements, which, while not ideal, still permits
> > > issuance using this method. As such, the 'root' cause is that the Baseline
> > > Requirements permit methods that are less secure than desired, and the
> > > discussion that follows is now around what steps to take - as CAs, as Root
> > > Programs, for site operators, and for the CA/Browser Forum.
> > > 
> > > When faced with a vulnerable validation method that is permitted, it's
> > > always a challenge to balance the need for security - for sites and users 
> > > -
> > > with the risk of compatibility and breakage from the removal of such a
> > > method. Fundamentally, the issues you raise call into question the level 
> > > of
> > > assurance of 3.2.2.4.9 and 3.2.2.4.10 in the Baseline Requirements, and 
> > > are
> > > not limited to TLS-SNI, and potentially affects every CA using these
> > > methods.
> > > 
> > > When evaluating these methods, and their risks, compared to, say, the
> > > also-weak 3.2.2.4.1 and 3.2.2.4.5 discussions ongoing with the CA/Browser
> > > Forum, a few key distinctions, although non-exhaustive, apply and are
> > > factored in to our response and proposal here:
> > > 
> > > - The average lifetime of certificates using these methods, across CAs,
> > > compared to 3.2.2.4.1/3.2.2.4.5, is significantly shorter - very close to
> > > the 90 days that Let's Encrypt uses, based on the available information we
> > > have. The fact that so many of these certificates are short lived creates 
> > > a
> > > situation in where there's simultaneously more risk to the ecosystem to
> > > rapidly removing these methods as acceptable (due to the need to
> > > obtain/renew certificate), while there's also much less risk in allowing
> > > this method to continue to be used for a limited time, due to the fact 
> > > that
> > > certificates that could be obtained by exploiting this will expire much
> > > sooner than the 2-3 years that many other certificates are issued with.
> > > That is, the security risk of a bad validation that lives for 3 years is
> > > much greater than the risk of a bad validation that lives for 90 days, and
> > > the fact that the badness is only valid for 90 days means that it's easier
> > > to allow it to more gracefully shut down than potentially accepting that
> > > implied risk for years.
> > > 
> > > - The ease of which alternative methods exist. Methods that are manual are
> > > substantially easier to remove quickly, as alternative manual processes 
> > > can
> > > also be used during the human-to-human interaction, while methods that are
> > > highly automated conversely create greater challenges, due to the need to
> > > update client software to whatever new automated methods may be used. 
> > > While
> > > 3.2.2.4.1 and 3.2.2.4.5 are highly human-driven methods, methods like
> > > 3.2.2.4.9 and .10 are designed for automation - and why we were supportive
> > > of their addition - but also mean that any mitigations will necessarily
> > > face ecosystem challenges, much like deploying new versions of TLS or
> > > deprecating old ones.
> > > 
> > > - The ease of which alternative automated methods can be used. As 
> > > automated
> > > methods are generally designed around integrated systems and certain 
> > > design
> > > constraints, it's not always possible to move to an equivalently
> > > automatable method (as it is with manual methods), and it may be that no
> > > equivalent automated method exists to fill the design niche. If that 
> > > design
> > > niche is a substantial one for clients, and enables otherwise 
> > > unautomatable
> > > systems, it can pose greater risk in prematurely removing it. Specific
> > > applications of the .9 and .10 methods, such as ACME's TLS-SNI, occupy an
> > > important niche, similar to the 3.2.2.4.6 method and ACME's HTTP-01 
> > > method,
> > > provide a level of automation for systems not directly integrated with 
> > > DNS,
> > > and while that means they must be particularly attentive to the security
> > > risks that come from that, done correctly, they can provide a greater path
> > > towards security.
> > > 
> > > - Compared to 3.2.2.4.1 and 3.2.2.4.5, specific applications of 3.2.2.4.9
> > > and 3.2.2.4.10 can be evaluated against possible mitigations for the risk,
> > > both short- and long-term, and steps in which site operators can take to
> > > affirmatively protect themselves offer better assurances than those that
> > > rely entirely on the CA's good behaviour. As you call out, the specific
> > > risks of TLS-SNI are limited to shared providers (not individual users)
> > > that meet certain conditions, and these shared providers can already take
> > > existing steps to minimize the immediate risk, such as blocking the use of
> > > certificates or SNI negotiations that contain the '.invalid' TLD. While
> > > this is not an ideal long-term solution, by any means, it allows us to
> > > frame both the immediate and specific risks and the ways to reduce that.
> > > 
> > > For the sake of brevity, I'll end my comparisons there, but hopefully
> > > highlights some of the factors we've considered in our response to your
> > > proposal.
> > > 
> > > Given the risks identified to 3.2.2.4.9 and 3.2.2.4.10, we think it would
> > > be best for CAs using these Baseline Requirements-acceptable methods of
> > > validation to begin immediately transitioning away from them, with the 
> > > goal
> > > of either removing them entirely from the Baseline Requirements, or
> > > identifying ways in which .9 and .10 can be better specified to mitigate
> > > such risks. That said, given the potential risks to the ecosystem,
> > > particularly those with pre-existing short-lived certificates, we think
> > > that, provided that the new certificates are valid for 90 days or less,
> > > we're open to allowing the specific TLS-SNI methods identified by the ACME
> > > specification to continue to be used for a limited time, while the broader
> > > community works to identify potential mitigations (if possible) or
> > > transition away from these methods.
> > > 
> > > While we don't think the current status quo represents a viable long-term
> > > solution, given that the ACME TLS-SNI methods have been broadly reviewed
> > > within the IETF, that the risks apply to a limited subset of specific
> > > infrastructures, that mitigations are possible for these infrastructures 
> > > to
> > > deploy, that Let's Encrypt is actively working with the community to
> > > identify, and ideally, share, those that haven't or cannot deploy such
> > > mitigations, and all of the other items previously mentioned, we think 
> > > this
> > > represents an appropriate short-term balance.
> > > 
> > > If and as new facts become available, it may be necessary to revisit this.
> > > We may have overlooked additional risks, or failed to consider mitigating
> > > factors. Further, this response is contextualized in the application of
> > > ACME's TLS-SNI methods for validation, and such a response may not be
> > > appropriate for other forms of validations within the framework of
> > > 3.2.2.4.9 and 3.2.2.4.10. Similarly, this response doesn't apply to
> > > certificates that may be valid for longer periods, as they may present
> > > substantially greater risk to making effective improvements to or an
> > > orderly transition away from these methods.
> > > 
> > > We look forward to working with other browser vendors, site operators, and
> > > the relying community to work out ways to provide an orderly and effective
> > > transition to more secure methods - whether that means away from the
> > > 3.2.2.4.9/.10 series of domain validations, or to more restrictive forms
> > > that are more clearly "opt-in" rather than the explicit "opt-out" proposed
> > > (of 'blacklisting .invalid').
> > > 
> > > We're also curious if we've overlooked salient details in our response, 
> > > and
> > > thus welcome feedback from Let's Encrypt, other CAs utilizing these
> > > validation methods (both TLS-SNI and 3.2.2.4.9 and 3.2.2.4.10), and the
> > > broader community as to our proposed next steps. Please consider this a
> > > draft response, and we look forward to future updates regarding proposed
> > > next steps.
> > 
> > We have published an update on our plans for TLS-SNI:
> > 
> > https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188
> > 
> > The short summary is that we do not plan to generally re-enable TLS-SNI 
> > validation, but we will introduce various forms of whitelists to limit 
> > impact during our transition away from TLS-SNI.
> > 
> > Thanks to everyone for the feedback on this thread already. Let us know if 
> > you have any questions or concerns.
> 
> Another update, the main thing being that we have deployed patches to our CA 
> that allow TLS-SNI for both renewal and whitelisted accounts, as we said we 
> would in our previous update:
> 
> https://community.letsencrypt.org/t/tls-sni-challenges-disabled-for-most-new-issuance/50316


I would like to thank our community, including many people who read m.d.s.p., 
for helping with our response. This includes individuals in the PKI community, 
other CAs, hosting and infrastructure providers, corporate security teams, and 
root programs.

Our response depended on quickly consuming large amounts of information from 
different external sources. We sought outside opinions regarding our 
vulnerability analysis, we needed to know how widespread the problem was, how 
fast many different organizations could patch, what the impact of disabling 
TLS-SNI for different periods of time would be, we had compliance questions...

Community members and partners immediately stepped up to provide input, many in 
the middle of the night via both phone and email. We're very grateful and we'll 
pay it forward given the opportunity.
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Re: 2018.01.09 Issue with TLS-SNI-01 and Shared Hosting Infrastructure

Reply via email to