Re: Broader lessons from 1718771

Tim Callan Thu, 05 Aug 2021 16:07:02 -0700


We’d like to offer our own perspective on this issue, having lived it 
firsthand, in case this perspective is valuable to the community.

It’s important to understand that while the total number of affected 
certificates was on the order of 100,000, the actual number of affected 
domains was about 1% of that.  It just happened that there were a large 
number of certificates using a few of these domain names.  That’s important 
because the exercise was to detect what ultimately turned out to be about 
1000 domains with a DCV problem from the vast number of domains for which 
we perform DCV every year – and that the exercise was to isolate and 
eliminate the certificates with one or more incorrectly validated domains, 
not to make a sweeping revocation that would have gotten all the affected 
domains and a large number of unaffected certificates as well.

This last point is important. It would have been fast and easy to create a 
query that would have caught 100% of this misissuance and that also would 
have revoked an order of magnitude more of other certificates as well, 
despite the fact that they were perfectly fine.  What slowed down the 
investigation was examining all domains in our corpus of active 
certificates for the many possible ways that DCV could have occurred.  The 
tangled skein we had to investigate included these factors:

   - The numbers of domains and certificates in question were both very 
   large.
   - The same domain name could have undergone DCV multiple times in 
   multiple ways for multiple different certificates.
   - Many certificates contain multiple (and in some cases hundreds of) 
   SANs.
   - Isolating certificates for revocation had to occur on a 
   certificate-by-certificate basis, not a domain-by-domain basis.
   - Frequent reissuance among hosting partners means the total number of 
   certificates to be tracked and DCV events to be traced is that much higher 
   and more complex.

The key idea here is that the first DCV result that returns from an initial 
query may not be the only DCV event that actually occurred.  We did have 
records of all these events, which is ultimately how we were able to 
execute this task, but as Ryan points out, this was one of those “data 
lake” situations, as we had to dig back into deeper records of our systems’ 
behavior.

It was, in fact, straightforward and reasonably fast to create that first 
list of suspect certificates for which we could not confirm that the “DCV 
reuse” had occurred within 825 days.  In another circumstance that might 
have been the end of the query and we would have had our results.  The 
problem here was that the reliance on DCV reuse was the very part of the 
system that was suspect, and so to put it under the magnifying glass we had 
to go to the very bottom of the data lake.

In other words, the fact that a particular certificate had DCV reuse marked 
incorrectly didn’t necessarily mean that DCV hadn’t occurred for that same 
domain in the specified time period, just that our primary record for that 
certificate didn’t indicate that this had happened.  In response to that 
problem we have a ticket in to create a new table that will log our 
successful BR-compliant DCV checks in a manner that will make this kind of 
search considerably faster and easier to perform in the future.

Likewise, if the exercise had been to look at a single certificate or 
relatively few certificates, we could have found the answer very quickly, 
in the “minutes not days” that Watson asks about.  However, if the request 
is for every large volume, global CA to be able at any time to perform an 
expansive search of every active certificate it has on any, single, 
unpredictable criterion that may be thrown its way and get a result back in 
minutes, that is a very difficult thing to be able to perform under any and 
all circumstances.  Another way to think about this is, is the CA’s 
database meant to be something where all conceivable questions must be 
answerable immediately, or is it more reasonable to expect that for 
unexpected and complex questions involving large numbers of certificates 
the CA can perform a data investigation and return with answers after “days 
not minutes”?

On Monday, July 19, 2021 at 10:33:58 AM UTC-4 Ryan Sleevi wrote:

> On Mon, Jul 19, 2021 at 9:28 AM 'Matthias van de Meent' via 
> [email protected] <[email protected]> wrote:
>
>> If a CA can't find its re-use of validation information in their audit
>> logs (as described in BR s5.4), then I believe that BR s5.4 was not
>> correctly implemented by that CA.
>>
>
> I wish it were that simple.
>
> At least one major CA (representing a non-trivial amount of issuance) has 
> stated that they maintain their audit logs as paper records. This is also 
> why changes to validation methods/reuse have, in the past, faced stiff 
> opposition - because some CAs are concerned with the cost and time simply 
> to determine who would be affected.
>
> This is, sadly, the distinction between "logged" and "searchable".
>
> We've equally seen a number of CA incidents where CAs maintain the data in 
> databases or so-called "data lakes", but then find it difficult to search. 
> Sectigo's bug is an example of the complexity of searching across disparate 
> datasets.
>
> At present, we largely rely on CAs to "Do the Right Thing" and prepare for 
> the worst case, and design their systems in a way that can support 
> investigations robustly and rapidly. In practice, however, we know that's 
> often far from the case. The Detailed Control Reports specifically aim to 
> provide greater insight into the system design and how it's measured, to 
> allow the development and harmonization of good practice, but a number of 
> CAs oppose that for cost. Worse, however, is that certain large audit firms 
> are concerned that having such detailed reports would jeopardize their 
> audit business, because of the reputational risk from revealing how their 
> audits are worse quality compared to both their competitors and the 
> overarching goal.
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/635fc27f-6b9f-449b-a9b4-cdcc1104f495n%40mozilla.org.

Re: Broader lessons from 1718771

Reply via email to