Speaking from a personal perspective -

This all makes sense, and, to be honest, the spectrum/grade idea isn’t a good 
or robust. Implementing something like that requires too many judgment 
questions about whether a CA belongs in box x vs. box y and what is the 
difference between those two boxes. I also get the frustration with certain 
issues, especially when the pop up among CAs – especially if the rule is well 
established.

I’ve been looking at the root causes of mis-issuance in detail (stating with 
DigiCert) and so far I’ve found they divide into a few buckets. 1) Where the CA 
relied on a third party for something and probably shouldn’t have, 2) where 
there was an internal engineering issue, 3) a manual process went bad, 4) 
software the CA relied on had an issue, or 5) the CA simply couldn’t/didn’t act 
in time. From the incidents I’ve categorized so far (still working on all the 
incidents for all CAs), the biggest buckets seem like engineering issues 
followed by manual process issues. For example, at DigiCert proper the 
engineering issues represent about 35% of the issues. (By DigiCert proper, I 
exclude the Sub CAs and Quovadis systems – this allows me to look exclusively 
at our internal operations compared to the operations of somewhat separate 
systems.)  The next biggest issue is our failure to move fast enough (30%) 
followed by manual process problems (24%). DigiCert proper doesn’t use very 
much third party software in its CA so that tends to be our smallest bucket.

The division between these categories is interesting because some are less in 
control of the CA than others. For example, if primekey has an issue pretty 
much everyone has an issue since so many CAs use primekey at some level 
(DigiCert via Quovadis). The division is also somewhat arbitrary and based 
solely on the filed incident reports. However, what I’m looking for is whether 
the issues result from human error, insufficient implementation timelines, 
engineering issues, or software issues.  I’m not ready to make a conclusion 
industry-wide.

The trend I’ve noticed at DigiCert is the percent of issues related to DigiCert 
manual processes is decreasing while the percent of engineering blips is 
increasing. This is a good trend as it means we are moving away from manual 
processes and into better automation. What else is interesting is the number of 
times we’ve had issues with moving too slow has dropped significantly over the 
last two year, which means we’ve seen substantial improvement in communication 
and handling of changes in industry standards. The number of issues increased, 
but I chalk that up to more transparency and scrutiny by the public (a good 
thing) than worse systems.

The net result is a nice report that we’re using internally (and will share 
externally) that shows where the biggest improvements have been made. We’re 
also hoping this data shows where we need to concentrate more. Right now, the 
data is showing more focus on engineering and unit tests to ensure all systems 
are updated when a guideline changes.

So why do I share this data now before it’s ready?  Well, I think looking at 
this information can maybe help define possible solutions. Long and windy, but…

One resulting idea is that maybe you could require a report on improvements 
from each CA based on their issues?  The annual audit could include a report 
similar to the above where the CA looks at the past year of their own mistakes 
and the other industry issues and evaluates how well they did compared to 
previous years. This report can also describe how the CA changed their system 
to comply with any new Mozilla or CAB Forum requirements. What automated 
process did they put in place to guarantee compliance? This part of the audit 
report can be used to reflect on the CA operations and make suggestions to the 
browsers on where they need to improve and where they need to automate. It can 
also be used to document one area of improvement they need to focus on.

Although this doesn’t cure immediate mis-issuances such does give better 
transparency into what CAs are doing to improve and exactly how they 
implemented the changes made to the Mozilla policy. A report like this also 
shifts the burden of dealing with issues to the community instead of the module 
owners and emphasizes the CA on fixing their systems and learning from 
mistakes. With the change to WebTrust audits, there’s an opportunity for more 
free-form reporting that can include this information. And this information has 
to be fare more interesting than reading about yet another individual who 
forgot to check a box in CCADB.

This is still more reactive than I’d like and sometimes requires a whole year 
before a CA gives information about the changes made to systems to reflect 
changes in policy. The report does get people thinking proactively about what 
they need to do to improve, which may, by itself, be a force for improvement. 
This also allows the community to evaluate a CA’s issues over the past year and 
how they addressed what went wrong compared to previous years and see what the 
CA is doing that will make the next year will be even better.


Jeremy



From: Ryan Sleevi <r...@sleevi.com>
Sent: Monday, October 7, 2019 6:45 PM
To: Jeremy Rowley <jeremy.row...@digicert.com>
Cc: mozilla-dev-security-policy 
<mozilla-dev-security-pol...@lists.mozilla.org>; r...@sleevi.com
Subject: Re: Mozilla Policy Requirements CA Incidents



On Mon, Oct 7, 2019 at 7:06 PM Jeremy Rowley 
<jeremy.row...@digicert.com<mailto:jeremy.row...@digicert.com>> wrote:
Interesting. I can't tell with the Netlock certificate, but the other three 
non-EKU intermediates look like replacements for intermediates that were issued 
before the policy date and then reissued after the compliance date.  The 
industry has established that renewal and new issuance are identical (source?), 
but we know some CAs treat these as different instances.

Source: Literally every time a CA tries to use it as an excuse? :)

My question is how we move past “CAs provide excuses”, and at what point the 
same excuses fall flat?

While that's not an excuse, I can see why a CA could have issues with a renewal 
compared to new issuance as changing the profile may break the underlying CA.

That was Quovadis’s explanation, although with no detail to support that it 
would break something, simply that they don’t review the things they sign. Yes, 
I’m frustrated that CAs continue to struggle with anything that is not entirely 
supervised. What’s the point of trusting a CA then?

 However, there's probably something better than "trust" vs. "distrust" or 
"revoke" v "non-revoke", especially when it comes to an intermediate.  I guess 
the question is what is the primary goal for Mozilla? Protect users? Enforce 
compliance?  They are not mutually exclusive objectives of course, but the 
primary drive may influence how to treat issuing CA non-compliance vs. 
end-entity compliance.

I think a minimum goal is to ensure the CAs they trust are competent and take 
their job seriously, fully aware of the risk they pose. I am more concerned 
about issues like this which CAs like QuoVadis acknowledges they would not 
cause.

The suggestion of a spectrum of responses fundamentally suggests root stores 
should eat the risk caused by CAs flagrant violations. I want to understand why 
browsers should continue to be left holding the bag, and why every effort at 
compliance seems to fall on how much the browsers push.

Of the four, only Quovadis has responded to the incident with real information, 
and none of them have filed the required format or given sufficient 
information. Is it too early to say what happens before there is more 
information about what went wrong? Key ceremonies are, unfortunately, very 
manual beasts. You can automate a lot of it with scripting tools, but the 
process of taking a key out, performing a ceremony, and putting things a way is 
not automated due to the off-line root and FIPS 140-3 requirements.

Yes, I think it’s appropriate to defer discussing what should happen to these 
specific CAs. However, I don’t think it’s too early to begin to try and 
understand why it continues to be so easy to find massive amounts of 
misissuance, and why policies that are clearly communicated and require 
affirmative consent is something CAs are still messing up. It suggests trying 
to improve things by strengthening requirements isn’t helping as much as 
needed, and perhaps more consistent distrusting is a better solution.

In any event, having CAs share the challenges is how we do better. 
Understanding how the CAs not affected prevent these issues is equally 
important. We NEED CAs to be better here, so what’s the missing part about why 
it’s working for some and failing for others?

I know it seems extreme to suggest to start distrusting CAs over this, but 
every single time, it seems there’s a CA communication, affirmative consent, 
and then failure. The most recent failure to disclose CAs is equally 
disappointing and frustrating, and it’s not clear we have CAs adequately 
prepared to comply with 2.7, no matter how much we try.
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to