On Wed, Jan 10, 2018 at 4:33 AM, josh--- via dev-security-policy < [email protected]> wrote:
> At approximately 5 p.m. Pacific time on January 9, 2018, we received a > report from Frans Rosén of Detectify outlining a method of exploiting some > shared hosting infrastructures to obtain certificates for domains he did > not control, by making use of the ACME TLS-SNI-01 challenge type. We > quickly confirmed the issue and mitigated it by entirely disabling > TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for finding > this issue and reporting it to us. > > We’d like to describe the issue and our plans for possibly re-enabling > TLS-SNI-01 support. > > Problem Summary > > In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA) > validates a domain name by generating a random token and communicating it > to the ACME client. The ACME client uses that token to create a self-signed > certificate with a specific, invalid hostname (for example, > 773c7d.13445a.acme.invalid), and configures the web server on the domain > name being validated to serve that certificate. The ACME server then looks > up the domain name’s IP address, initiates a TLS connection, and sends the > specific .acme.invalid hostname in the SNI extension. If the response is a > self-signed certificate containing that hostname, the ACME client is > considered to be in control of the domain name, and will be allowed to > issue certificates for it. > > However, Frans noticed that at least two large hosting providers combine > two properties that together violate the assumptions behind TLS-SNI: > > * Many users are hosted on the same IP address, and > * Users have the ability to upload certificates for arbitrary names > without proving domain control. > > When both are true of a hosting provider, an attack is possible. Suppose > example.com’s DNS is pointed at the same shared hosting IP address as a > site controlled by the attacker. The attacker can run an ACME client to get > a TLS-SNI-01 challenge, then install their .acme.invalid certificate on the > hosting provider. When the ACME server looks up example.com, it will > connect to the hosting provider’s IP address and use SNI to request the > .acme.invalid hostname. The hosting provider will serve the certificate > uploaded by the attacker. The ACME server will then consider the attacker’s > ACME client authorized to issue certificates for example.com, and be > willing to issue a certificate for example.com even though the attacker > doesn’t actually control it. > > This issue only affects domain names that use hosting providers with the > above combination of properties. It is independent of whether the hosting > provider itself acts as an ACME client. > > Our Plans > > Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s > Encrypt. However, a large number of people and organizations use the > TLS-SNI-01 challenge type to get certificates. It’s important that we > restore service if possible, though we will only do so if we’re confident > that the TLS-SNI-01 challenge type is sufficiently secure. > > At this time, we believe that the issue can be addressed by having certain > services providers implement stronger controls for domains hosted on their > infrastructure. We have been in touch with the providers we know to be > affected, and mitigations will start being deployed for their systems > shortly. > > Over the next 48 hours we will be building a list of vulnerable providers > and their associated IP addresses. Our tentative plan, once the list is > completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable > providers blocked from using it. > > We’re also going to be soliciting feedback on our plans from our > community, partners and other PKI stakeholders prior to re-enabling the > TLS-SNI-01 challenge. There is a lot to consider here and we’re looking > forward to feedback. > > We will post more information and details as our plans progress. > (Wearing a Google Chrome hat on behalf of our root store policy) Josh, Thanks for bringing this rapidly to the attention of the broader community and proactively reaching out to root programs. As framing to the discussion, we still believe TLS-SNI is fully permitted by the Baseline Requirements, which, while not ideal, still permits issuance using this method. As such, the 'root' cause is that the Baseline Requirements permit methods that are less secure than desired, and the discussion that follows is now around what steps to take - as CAs, as Root Programs, for site operators, and for the CA/Browser Forum. When faced with a vulnerable validation method that is permitted, it's always a challenge to balance the need for security - for sites and users - with the risk of compatibility and breakage from the removal of such a method. Fundamentally, the issues you raise call into question the level of assurance of 3.2.2.4.9 and 3.2.2.4.10 in the Baseline Requirements, and are not limited to TLS-SNI, and potentially affects every CA using these methods. When evaluating these methods, and their risks, compared to, say, the also-weak 3.2.2.4.1 and 3.2.2.4.5 discussions ongoing with the CA/Browser Forum, a few key distinctions, although non-exhaustive, apply and are factored in to our response and proposal here: - The average lifetime of certificates using these methods, across CAs, compared to 3.2.2.4.1/3.2.2.4.5, is significantly shorter - very close to the 90 days that Let's Encrypt uses, based on the available information we have. The fact that so many of these certificates are short lived creates a situation in where there's simultaneously more risk to the ecosystem to rapidly removing these methods as acceptable (due to the need to obtain/renew certificate), while there's also much less risk in allowing this method to continue to be used for a limited time, due to the fact that certificates that could be obtained by exploiting this will expire much sooner than the 2-3 years that many other certificates are issued with. That is, the security risk of a bad validation that lives for 3 years is much greater than the risk of a bad validation that lives for 90 days, and the fact that the badness is only valid for 90 days means that it's easier to allow it to more gracefully shut down than potentially accepting that implied risk for years. - The ease of which alternative methods exist. Methods that are manual are substantially easier to remove quickly, as alternative manual processes can also be used during the human-to-human interaction, while methods that are highly automated conversely create greater challenges, due to the need to update client software to whatever new automated methods may be used. While 3.2.2.4.1 and 3.2.2.4.5 are highly human-driven methods, methods like 3.2.2.4.9 and .10 are designed for automation - and why we were supportive of their addition - but also mean that any mitigations will necessarily face ecosystem challenges, much like deploying new versions of TLS or deprecating old ones. - The ease of which alternative automated methods can be used. As automated methods are generally designed around integrated systems and certain design constraints, it's not always possible to move to an equivalently automatable method (as it is with manual methods), and it may be that no equivalent automated method exists to fill the design niche. If that design niche is a substantial one for clients, and enables otherwise unautomatable systems, it can pose greater risk in prematurely removing it. Specific applications of the .9 and .10 methods, such as ACME's TLS-SNI, occupy an important niche, similar to the 3.2.2.4.6 method and ACME's HTTP-01 method, provide a level of automation for systems not directly integrated with DNS, and while that means they must be particularly attentive to the security risks that come from that, done correctly, they can provide a greater path towards security. - Compared to 3.2.2.4.1 and 3.2.2.4.5, specific applications of 3.2.2.4.9 and 3.2.2.4.10 can be evaluated against possible mitigations for the risk, both short- and long-term, and steps in which site operators can take to affirmatively protect themselves offer better assurances than those that rely entirely on the CA's good behaviour. As you call out, the specific risks of TLS-SNI are limited to shared providers (not individual users) that meet certain conditions, and these shared providers can already take existing steps to minimize the immediate risk, such as blocking the use of certificates or SNI negotiations that contain the '.invalid' TLD. While this is not an ideal long-term solution, by any means, it allows us to frame both the immediate and specific risks and the ways to reduce that. For the sake of brevity, I'll end my comparisons there, but hopefully highlights some of the factors we've considered in our response to your proposal. Given the risks identified to 3.2.2.4.9 and 3.2.2.4.10, we think it would be best for CAs using these Baseline Requirements-acceptable methods of validation to begin immediately transitioning away from them, with the goal of either removing them entirely from the Baseline Requirements, or identifying ways in which .9 and .10 can be better specified to mitigate such risks. That said, given the potential risks to the ecosystem, particularly those with pre-existing short-lived certificates, we think that, provided that the new certificates are valid for 90 days or less, we're open to allowing the specific TLS-SNI methods identified by the ACME specification to continue to be used for a limited time, while the broader community works to identify potential mitigations (if possible) or transition away from these methods. While we don't think the current status quo represents a viable long-term solution, given that the ACME TLS-SNI methods have been broadly reviewed within the IETF, that the risks apply to a limited subset of specific infrastructures, that mitigations are possible for these infrastructures to deploy, that Let's Encrypt is actively working with the community to identify, and ideally, share, those that haven't or cannot deploy such mitigations, and all of the other items previously mentioned, we think this represents an appropriate short-term balance. If and as new facts become available, it may be necessary to revisit this. We may have overlooked additional risks, or failed to consider mitigating factors. Further, this response is contextualized in the application of ACME's TLS-SNI methods for validation, and such a response may not be appropriate for other forms of validations within the framework of 3.2.2.4.9 and 3.2.2.4.10. Similarly, this response doesn't apply to certificates that may be valid for longer periods, as they may present substantially greater risk to making effective improvements to or an orderly transition away from these methods. We look forward to working with other browser vendors, site operators, and the relying community to work out ways to provide an orderly and effective transition to more secure methods - whether that means away from the 3.2.2.4.9/.10 series of domain validations, or to more restrictive forms that are more clearly "opt-in" rather than the explicit "opt-out" proposed (of 'blacklisting .invalid'). We're also curious if we've overlooked salient details in our response, and thus welcome feedback from Let's Encrypt, other CAs utilizing these validation methods (both TLS-SNI and 3.2.2.4.9 and 3.2.2.4.10), and the broader community as to our proposed next steps. Please consider this a draft response, and we look forward to future updates regarding proposed next steps. _______________________________________________ dev-security-policy mailing list [email protected] https://lists.mozilla.org/listinfo/dev-security-policy

