We’d like to offer our own perspective on this issue, having lived it firsthand, in case this perspective is valuable to the community.
It’s important to understand that while the total number of affected certificates was on the order of 100,000, the actual number of affected domains was about 1% of that. It just happened that there were a large number of certificates using a few of these domain names. That’s important because the exercise was to detect what ultimately turned out to be about 1000 domains with a DCV problem from the vast number of domains for which we perform DCV every year – and that the exercise was to isolate and eliminate the certificates with one or more incorrectly validated domains, not to make a sweeping revocation that would have gotten all the affected domains and a large number of unaffected certificates as well. This last point is important. It would have been fast and easy to create a query that would have caught 100% of this misissuance and that also would have revoked an order of magnitude more of other certificates as well, despite the fact that they were perfectly fine. What slowed down the investigation was examining all domains in our corpus of active certificates for the many possible ways that DCV could have occurred. The tangled skein we had to investigate included these factors: - The numbers of domains and certificates in question were both very large. - The same domain name could have undergone DCV multiple times in multiple ways for multiple different certificates. - Many certificates contain multiple (and in some cases hundreds of) SANs. - Isolating certificates for revocation had to occur on a certificate-by-certificate basis, not a domain-by-domain basis. - Frequent reissuance among hosting partners means the total number of certificates to be tracked and DCV events to be traced is that much higher and more complex. The key idea here is that the first DCV result that returns from an initial query may not be the only DCV event that actually occurred. We did have records of all these events, which is ultimately how we were able to execute this task, but as Ryan points out, this was one of those “data lake” situations, as we had to dig back into deeper records of our systems’ behavior. It was, in fact, straightforward and reasonably fast to create that first list of suspect certificates for which we could not confirm that the “DCV reuse” had occurred within 825 days. In another circumstance that might have been the end of the query and we would have had our results. The problem here was that the reliance on DCV reuse was the very part of the system that was suspect, and so to put it under the magnifying glass we had to go to the very bottom of the data lake. In other words, the fact that a particular certificate had DCV reuse marked incorrectly didn’t necessarily mean that DCV hadn’t occurred for that same domain in the specified time period, just that our primary record for that certificate didn’t indicate that this had happened. In response to that problem we have a ticket in to create a new table that will log our successful BR-compliant DCV checks in a manner that will make this kind of search considerably faster and easier to perform in the future. Likewise, if the exercise had been to look at a single certificate or relatively few certificates, we could have found the answer very quickly, in the “minutes not days” that Watson asks about. However, if the request is for every large volume, global CA to be able at any time to perform an expansive search of every active certificate it has on any, single, unpredictable criterion that may be thrown its way and get a result back in minutes, that is a very difficult thing to be able to perform under any and all circumstances. Another way to think about this is, is the CA’s database meant to be something where all conceivable questions must be answerable immediately, or is it more reasonable to expect that for unexpected and complex questions involving large numbers of certificates the CA can perform a data investigation and return with answers after “days not minutes”? On Monday, July 19, 2021 at 10:33:58 AM UTC-4 Ryan Sleevi wrote: > On Mon, Jul 19, 2021 at 9:28 AM 'Matthias van de Meent' via > [email protected] <[email protected]> wrote: > >> If a CA can't find its re-use of validation information in their audit >> logs (as described in BR s5.4), then I believe that BR s5.4 was not >> correctly implemented by that CA. >> > > I wish it were that simple. > > At least one major CA (representing a non-trivial amount of issuance) has > stated that they maintain their audit logs as paper records. This is also > why changes to validation methods/reuse have, in the past, faced stiff > opposition - because some CAs are concerned with the cost and time simply > to determine who would be affected. > > This is, sadly, the distinction between "logged" and "searchable". > > We've equally seen a number of CA incidents where CAs maintain the data in > databases or so-called "data lakes", but then find it difficult to search. > Sectigo's bug is an example of the complexity of searching across disparate > datasets. > > At present, we largely rely on CAs to "Do the Right Thing" and prepare for > the worst case, and design their systems in a way that can support > investigations robustly and rapidly. In practice, however, we know that's > often far from the case. The Detailed Control Reports specifically aim to > provide greater insight into the system design and how it's measured, to > allow the development and harmonization of good practice, but a number of > CAs oppose that for cost. Worse, however, is that certain large audit firms > are concerned that having such detailed reports would jeopardize their > audit business, because of the reputational risk from revealing how their > audits are worse quality compared to both their competitors and the > overarching goal. > -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/635fc27f-6b9f-449b-a9b4-cdcc1104f495n%40mozilla.org.
