Re: [Acme] Stale DNS and high-risk issuance

Kevin Borgolte Thu, 20 Jul 2017 23:22:13 -0700

Hi Ilari,

thanks for the thoughtful and fast response, we certainly did not
expect one this quickly!


> On Thu, Jul 20, 2017 at 12:18:10AM -0700, Kevin Borgolte wrote:
> > Hi everyone,
> >
> > we are currently conducting a measurement study about DNS staleness issues
> > with a focus on IP address churning. We encountered the issue that Let's
> > Encrypt (and ACME in general) can be (ab)used to request and receive a
> > valid certificate for a domain, as long as the attacker obtains access to
> > the IP address to which the DNS record points. This can happen due to stale
> > DNS records (e.g., pointing to cloud providers), or through MitM attacks
> > (e.g., as recently observed when an (accidental) BGP hijack re-routed all
> > major payment processors via Moscow). The issued certificates are
> > independent of the actual IP address used for the fraudulent certificate
> > request, and can later be used for malicious purposes (local interception
> > etc.).
> >
> > Although it is a valid certificate request, we think that the ACMEv2
> > standard should mention practical ways to prevent these cases. This is
> > especially the case, as the malicious certificate issuances may not
> > necessarily attract attention through certificate transparency logs (as the
> > domain points to the same IP address before/after).

> The methods in base spec just fundamentially rely on security of DNS.
> The only way to remove dependence on routing would be:
>
> 1) Deploy DNSSEC and
> 2) Add acme-CAA record restricting validation to DNS.

This would indeed be an alternative solution to our proposal, as it
would similarly solve the problem of DNS-related issues. However, a
DNS-based solution would not be as practical for most ACME users (yet)
because they might not have full control over their nameservers (the
web interface of their registrar that they might have to use might not
even support CAA), or it might require manual work to renew the
certificate. Further considering regular key renewals, a DNS-based
solution appears to be less practical and could hamper adoption.

> > For example: Consider that Let's Encrypt is used by a service running
> > a.example.com, where the A record of a.example.com points to 192.0.2.1 and
> > 198.51.100.1, with the record pointing to 198.51.100.1 being a stale DNS
> > record, and 198.51.100.1 being an address for a VM instance in a cloud
> > provider's pool. If an attacker now finds a way to allocate 198.51.100.1,
> > she might also use Let's Encrypt to obtain a valid certificate. The
> > certificate request by the attacker is indistinguishable from the outside
> > and from interpreting the certificate transparency logs from what the
> > legitimate operator would do to fix the service behind 198.51.100.1, which
> > previously went stale.

> Stale IP addresses cause problems when renewing (however, that does not
> occur that often). Also, due to recent "fixes", IPv6 stale addresses can
> be missed (there are just so many folks with busted DNS configuraiton).
>
> And I think that would also cause large amounts of problems for users
> if the stale record is IPv4 (if it is IPv6, then "happy eyeballs" hides
> the problem).

We are not 100% sure what you mean exactly here, but for us, the
problem with stale DNS records is not with renewing itself, but with
an attacker exploiting the stale DNS information combined with the
high IP address churn (which we see particularly for cloud providers).

Our definition of stale DNS seems be slightly different or broader
than yours: specifically, we also consider domains as stale for which
there is only A record, and not only cases in which a single IP
address out of multiple might be stale. For instance, a practice we
encountered quite often is that VM instances on a cloud provider are
freed and the associated IP addresses are released, but the respective
DNS records are not updated for some days. In turn, someone else can
claim the same IP address and then request a certificate for the
domain. In practice, some hours are often sufficient to claim a
released address to successfully launch the attack (which might be
possible with caching depending on the used nameservers) and we have
seen some cases with customer-specific sub-domains being susceptible
to the attack (which might be a worthwhile phishing target).

> > We also propose, focusing on high-risk targets, a stricter issuance policy:
> >
> > If a valid certificate (e.g., issued by the same operator or a set of
> > operators, checked via CT logs) exists for the requested domain, then the
> > current challenge should be signed by the key. If the last certificate has
> > expired, a grace period set by operators could apply (e.g., 1 month or 3
> > months). If the expiration date has passed a long time ago, or if no grace
> > period is used by the operator, then a second channel should be used to
> > verify the request (e.g., DNS CRP).

> There was PoP challenge in earlier versions of ACME spec for exactly
> that. However, it was removed, because it seemed to be more trouble
> than worth, and because it signed JSON messages with key intended for
> something else, which is cryptographically not kosher (and contexts are
> unusable in real world).

We should have been more clear here: rather than signing JSON
messages, the authorization should happen over HTTPS rather than HTTP,
with the current certificate being used for signing/TLS. There would
be no non-kosher JSON signing.

> In the "order flow" change, did put the CSR first so one can look up
> the requested key first. However, this would cause major problems if
> trying to change keys (and many (I am not among them) believe that
> rotating private keys often is important).

Would you mind elaborate where this would cause major problems exactly
when changing keys?

Following 7.4 "Applying for Certificate Issuance," the CSR should be
part of the initial POST request. Therefore, to verify that the
certificate can be issued according to the stricter policy, the ACME
server could simply use the HTTP challenge adapted for HTTPS before
issuing the certificate, verifying ownership of the old/current
certificate non-intrusively. Since the HTTP challenge must be done
over HTTP, not HTTPS, there is also no potential conflict.

> This would also complicate "disaster recovery" where the previous keys
> are lost (usually due to admin mistake, and backups being AWOL like
> usual). You would not believe how often that happens (and I hear just
> a small subset).

Correct, disaster recovery would be made slightly more manual by
requiring a second channel to verify the request (e.g., setting a DNS
record in addition to HTTPS). Although this might happen quite
frequently, we strongly believe that we should opt for secure by
default, instead of allowing a practical attack because of disaster
recovery eventualities. The burden of our solution is not significant
in that case (see below) and what is to be expected for recovering
lost credentials (e.g., your email account): a second channel.

> > This stricter issuance policy has the following benefits:
> >
> > * No burden in the normal and regular case
> > * No (significant) burden if a domain changes ownership legitimately
> > * Only minor and a one-time burden if a certificate has expired recently
> > (depending on a grace period being used)

> Err, what? These don't seem even remotely right to me.

>From what we can tell, most users are/will be using ACME through an
ACME client and via HTTP authorization (e.g., LE). For our proposal,
the entire process would be transparent for them (case 1). If
necessary, a simple prompt can be thrown to instruct users to select a
second channel (email, DNS, etc.), with instructions to set them up
(case 2 and 3). For cases 2 and 3, after the initial request, the
domain would fall back into case 1, which does not require any user
interaction and is transparent to the user. In general, the
alternative solution, using DNSSEC and restricting ACME to DNS, would
be much more cumbersome for users, as they need to set DNS records
(often manually).

> > At the same time, it prevents issuing certificates for domains that have
> > stale DNS records (due to caching or not updated yet). Furthermore, it is a
> > solution independent of CAA records, which is not a solution for all
> > problems arising from stale DNS records (e.g., if a DNS record is stale, it
> > points (A RR) to a cloud IP that an attacker gained control over, a CA is
> > designated as allowed to issue certificates via a CAA RR, and the
> > designated account at the CA is using an email address pointing to the same
> > domain; in this case, the attacker can request a certificate valid for some
> > months, giving ample time for an attack).

> Caching should not be a big issue: LE limits the cache validity to
> pretty short values.
>
> How many times has LE issued certificate to wrong actor due to stale
> DNS data (I am aware of at least one DNS hijack incident, but
> protecting against those is considered too high bar for CAs, even
> DNSSEC would not have helped).

Generally, that is a very difficult (if not impossible) question to
answer: it does not show up in CT logs as problematic because the
attack looks like a common renewal. To determine whether it was a
staleness issue and that the certificate request is illegitimate, one
needs complimentary and historical DNS and IP and port liveness data
(particularly the latter is almost impossible to get).

We focus our data on potential cases that could be claimed
certificates for and for which there is DNS traffic on the Internet.
Our results show conclusively that the attack is practical for a large
number of cases.

> > In respect to high-risk targets, we are seeing significant staleness issues
> > for domains pointing to IP addresses at various cloud providers in our
> > study. Additionally, from our results, requesting the respective IP
> > addresses quickly and automatically is practical for an attacker.
> > Therefore, we would recommend to classify domains pointing to cloud
> > providers as high-risk targets; practically, this should be a decision made
> > by the operator and the standard should only inform the operator about the
> > issue and possible solution.

> Unfortunately domains pointing at cloud provoders is pretty common,
> and classifying those as high-risk would cause major issues for
> legimate users.

With our proposed solution, the only time the user would see a burden
is if a domain changed ownership legitimately while a certificate is
still valid (case 2), or if the certificate renewal lapsed (case 3).
In the normal case, legitimate users should see no difference and
encounter no issues. Naturally, system administrator experience is a
driving force for HTTPS adoption, and it is why we think our solution
has significant potential.

Best,
Kevin

On Thu, Jul 20, 2017 at 2:52 AM, Ilari Liusvaara
<[email protected]> wrote:
> On Thu, Jul 20, 2017 at 12:18:10AM -0700, Kevin Borgolte wrote:
>> Hi everyone,
>>
>> we are currently conducting a measurement study about DNS staleness issues
>> with a focus on IP address churning. We encountered the issue that Let's
>> Encrypt (and ACME in general) can be (ab)used to request and receive a
>> valid certificate for a domain, as long as the attacker obtains access to
>> the IP address to which the DNS record points. This can happen due to stale
>> DNS records (e.g., pointing to cloud providers), or through MitM attacks
>> (e.g., as recently observed when an (accidental) BGP hijack re-routed all
>> major payment processors via Moscow). The issued certificates are
>> independent of the actual IP address used for the fraudulent certificate
>> request, and can later be used for malicious purposes (local interception
>> etc.).
>>
>> Although it is a valid certificate request, we think that the ACMEv2
>> standard should mention practical ways to prevent these cases. This is
>> especially the case, as the malicious certificate issuances may not
>> necessarily attract attention through certificate transparency logs (as the
>> domain points to the same IP address before/after).
>
> The methods in base spec just fundamentially rely on security of DNS.
> The only way to remove dependence on routing would be:
>
> 1) Deploy DNSSEC and
> 2) Add acme-CAA record restricting validation to DNS.
>
>> For example: Consider that Let's Encrypt is used by a service running
>> a.example.com, where the A record of a.example.com points to 192.0.2.1 and
>> 198.51.100.1, with the record pointing to 198.51.100.1 being a stale DNS
>> record, and 198.51.100.1 being an address for a VM instance in a cloud
>> provider's pool. If an attacker now finds a way to allocate 198.51.100.1,
>> she might also use Let's Encrypt to obtain a valid certificate. The
>> certificate request by the attacker is indistinguishable from the outside
>> and from interpreting the certificate transparency logs from what the
>> legitimate operator would do to fix the service behind 198.51.100.1, which
>> previously went stale.
>
> Stale IP addresses cause problems when renewing (however, that does not
> occur that often). Also, due to recent "fixes", IPv6 stale addresses can
> be missed (there are just so many folks with busted DNS configuraiton).
>
> And I think that would also cause large amounts of problems for users
> if the stale record is IPv4 (if it is IPv6, then "happy eyeballs" hides
> the problem).
>
>> We also propose, focusing on high-risk targets, a stricter issuance policy:
>>
>> If a valid certificate (e.g., issued by the same operator or a set of
>> operators, checked via CT logs) exists for the requested domain, then the
>> current challenge should be signed by the key. If the last certificate has
>> expired, a grace period set by operators could apply (e.g., 1 month or 3
>> months). If the expiration date has passed a long time ago, or if no grace
>> period is used by the operator, then a second channel should be used to
>> verify the request (e.g., DNS CRP).
>
> There was PoP challenge in earlier versions of ACME spec for exactly
> that. However, it was removed, because it seemed to be more trouble
> than worth, and because it signed JSON messages with key intended for
> something else, which is cryptographically not kosher (and contexts are
> unusable in real world).
>
> In the "order flow" change, did put the CSR first so one can look up
> the requested key first. However, this would cause major problems if
> trying to change keys (and many (I am not among them) believe that
> rotating private keys often is important).
>
> This would also complicate "disaster recovery" where the previous keys
> are lost (usually due to admin mistake, and backups being AWOL like
> usual). You would not believe how often that happens (and I hear just
> a small subset).
>
>> This stricter issuance policy has the following benefits:
>>
>> * No burden in the normal and regular case
>> * No (significant) burden if a domain changes ownership legitimately
>> * Only minor and a one-time burden if a certificate has expired recently
>> (depending on a grace period being used)
>
> Err, what? These don't seem even remotely right to me.
>
>> At the same time, it prevents issuing certificates for domains that have
>> stale DNS records (due to caching or not updated yet). Furthermore, it is a
>> solution independent of CAA records, which is not a solution for all
>> problems arising from stale DNS records (e.g., if a DNS record is stale, it
>> points (A RR) to a cloud IP that an attacker gained control over, a CA is
>> designated as allowed to issue certificates via a CAA RR, and the
>> designated account at the CA is using an email address pointing to the same
>> domain; in this case, the attacker can request a certificate valid for some
>> months, giving ample time for an attack).
>
> Caching should not be a big issue: LE limits the cache validity to
> pretty short values.
>
> How many times has LE issued certificate to wrong actor due to stale
> DNS data (I am aware of at least one DNS hijack incident, but
> protecting against those is considered too high bar for CAs, even
> DNSSEC would not have helped).
>
>> While one could argue that an attacker could use a different
>> domain-validating CA all together, HTTP Public Key Pinning could prevent
>> her attacks in practice.
>
> Unfortunately, HPKP is pretty hard to deploy:
>
> CA HPKP has two major problems:
>
> - CAs rotating intermediate keys in unpredictable ways (upcoming LE
>   ECDSA intermediate anyone?).
> - Chain-building problems, especially with Chrome. Sending the pinned
>   intermediate in legal chain does not imply that HPKP will pass!
>
> EE HPKP does not fail due to CAs rotating keys or chain-building
> issues, however:
>
> - One needs to be serious about backups.
>
>
> And EE HPKP would take care of this problem by itself anyway.
>
>> In respect to high-risk targets, we are seeing significant staleness issues
>> for domains pointing to IP addresses at various cloud providers in our
>> study. Additionally, from our results, requesting the respective IP
>> addresses quickly and automatically is practical for an attacker.
>> Therefore, we would recommend to classify domains pointing to cloud
>> providers as high-risk targets; practically, this should be a decision made
>> by the operator and the standard should only inform the operator about the
>> issue and possible solution.
>
> Unfortunately domains pointing at cloud provoders is pretty common,
> and classifying those as high-risk would cause major issues for
> legimate users.
>
>> Please let us know what you think about the proposal. We would also be
>> happy to share a version of our paper once we have finalized it for
>> submission in the upcoming weeks.
>
> I think it is a bad bad idea.
>
>
> -Ilari

_______________________________________________
Acme mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/acme

Re: [Acme] Stale DNS and high-risk issuance

Reply via email to