Thanks for filing this, Jeremy. If I understand correctly, the request DigiCert is asking is: "If we submitted this as an incident report, would it be likely that conversations about distrusting DigiCert would begin?", and that's what you're trying to gauge from the community?
I think Wayne's already captured the "We need more information", but I think it may be helpful to explain the reasoning and thinking here. The Baseline Requirements and Root Program policies exist for a purpose: To provide a consistent set of expectations for CAs, which meet the security needs of the products using or operating those policies. As these policies tend to call out, a CA may be removed (distrusted) for any reason or no reason at all - it's entirely at the Program discretion. That said, history tends to largely see removals for patterns of issues that, in aggregate, demonstrate an ongoing and significant risk to users and the Internet at large, although there have been CAs removed for single incidents in the past - such as key compromise or issuing MITM certificates. As a CA, the risk is that any and every incident may lead to the CA's removal, and thus the best path to avoid that is to not have incidents in the first place. Further, a CA with a pattern of incidents is not wrong to be even more careful when it comes to presenting new incidents, especially if they realize that they share similar root causes or further demonstrate problematic patterns. That's not to say that if you only had a single incident, you wouldn't be removed - as the policies capture, any reason and no reason - but on the balance, it has historically tended to be less-likely that first-time incidents lead to removal. When incidents happen, it becomes necessary for Root Programs and the communities they represent or collaborate with to evaluate the details of the incident, as part of making a determination about what next steps are appropriate. This involves investigating what the underlying root causes are, to both ensure that the current CA with the incident understands the significance, the severity, and the steps to remediate it, as well as to help the industry at large develop and learn best practices, to prevent future incidents. We're not the only industry to do this - in many ways, it's borrowed from the aviation industry, that recognizes that critical safety functions deserve thoughtful and detailed analysis to prevent harm coming to those that trust in them. Incident reports also serve to triage the issues - to work and identify the risks and make sure they're being mitigated in a timely fashion. Sometimes, the mitigation of risk may be to remove trust in the CA, other times, there may be less significant steps that can be taken to address both the immediate problem and the underlying issues. DigiCert is now in a precarious position. As a CA, it knows that every one of its Subscribers have agreed, in some legally binding form, that if the CA has misissued a certificate, that it MUST be revoked within 24 hours (or, very recently, and only in some cases, 5 days). The CA has a duty and obligation to their customers, the Subscribers, to make sure that they understand this. This is not about a punitive measure or punishing those users for something their CA did - it's because the fundamental and inherent risk is that there are incidents where certificates will need to be replaced in as little as 24 hours, up to and including trust in the CA being removed. To go back to that aviation analogy, the reason planes have maintenance schedules is not because they're going to completely come unglued and fall apart if you miss that maintenance schedule by a day - but because of the severe and significant harm that comes about from having no maintenance schedule at all, or even simply one that just isn't suitable for the risks (to life, property, and safety). Matt Palmer's reply earlier in this thread further expands on some of the other risks here and the hazards that come with. At the same time, DigiCert is, on behalf of their customers, saying that even though both DigiCert and their customers agreed to the 24-hour revocation rule, there are circumstances and situations that make that risky. Despite being an industry standard (as captured in the Baseline Requirements), and despite these agreements, DigiCert is concerned that there are consequences for these customers that did not take adequate precautions to meet the expectations they agreed to, and is trying to perform a risk analysis. Further, they're looking for feedback from the community to make sure that their analysis of the risk - the disruption to their customers - is significant enough that it warrants both the immediate risk of not revoking, the business risk to DigiCert, and the lasting risk to the ecosystem, in intentionally violating the BRs. It's not my intent to sound harsh, but to make sure it's clearly and unambiguously stated as to what's happening. The reason for doing this is because, on the balance, this seems to be exactly the recommendation in https://wiki.mozilla.org/CA/Responding_To_An_Incident . This is called out explicitly in the section on Revocation, which instruct the CA to perform a risk analysis, develop a report, and devise a plan and timeline for remediation. Further, this analysis should consider feedback of third-parties, calling out explicitly both the CA's auditor and Root Stores, as a means of checking that the analysis is balancing the right tradeoffs, and that the plan is reasonable. When a CA reports an incident, there is a discussion about what certificates were impacted and the CA's plan and timeline to remediate them - with the standing expectation being immediate revocation without some otherwise demonstrable exigent risk. These plans factor into how the incident is responded to by the Root Program - for example, the plan may have inappropriately balanced the risk, they may have outright misrepresented it, they may have misunderstood or mislead the community on the size and scope of the issue, etc. Further, even if a plan is agreed to as being acceptable (i.e. the incident not leading outright to discussions of distrust), the incident is not actually closed out until the CA has demonstrated the successful execution against that plan. I know this message is long, and much of it stuff you know (but for which others following may be unfamiliar with), but it gets both to the heart of the request you're making and the key expectations to be able to respond. You want to know whether, if this incident were filed, it would lead to a discussion of distrust in some form, whether individually or in the collective whole of the issues that DigiCert has had over the past several years. The only reason we're even discussing this incident, specifically, is because it relates to revocation following a previous incident (underscores), which is the only thing acknowledged as even being up for discussion or risk assessment by CAs. To be unambiguously clear, this would be a wholly inappropriate request for any other form of BR violation, but because it's specifically about balancing revocation and risk, it is allowed, for now. In order to answer that, we need to know: 1) What's the scope of the issue 2) What are the risks, as identified by DigiCert, and are they meaningfully explained? 3) What's the concrete plan for remediation being presented As it stands, it sounds like you've provided #1, which is Question 4 on the incident report template. As called out by Wayne, #2 seems missing, and that's captured by Question 6 on the incident report template, combined with the facts and details from Question 2. Most concerning to me, however, is that I can't find an answer to #3 - which is what Question 7 on the template is trying to help identify. These are things that only DigiCert can answer, and like any other CA, it needs to provide sufficient detail to demonstrate that the issues are understood and being meaningfully addressed, and that opportunities to improve are actively being pursued. Please don't think of this as punishing DigiCert for even asking. I think its commendable, that for the sole topic of revocation, DigiCert is taking steps to engage in the risk analysis early, and publicly. You're not the first CA to do so - other CAs have shared remediation plans regarding, for example, TLS validation methods, and those too provided ways to balance risk and measure progress. That said, as I mentioned earlier, I think that going into 2019, we collectively, and CAs particularly, need to be taking steps to prevent these conversations from ever being necessary, and, fortunately or unfortunately for DigiCert, this places y'all in a unique position of having both opportunity to use this long-standing and existing practice, but also high-expectations on how to meaningfully ensure this process never has to happen again. All of this is said to make it clear that #3 - the concrete plan - not only needs to include the remediation plan for these specific certs to be revoked, and concrete dates and measurable milestones to see how well DigiCert is progressing on that, but also needs to provide details as to how DigiCert is taking steps to ensure that their customers do not find themselves in these positions going forward. For example, a commitment to open, standards-based automation solutions provides an interoperable, industry-wide solution that such customers can ensure certificates are replaced timely, whether because the issuing CA needed to reissue, or because the issuing CA was no longer trusted. Similarly, one could imagine that a plan also included a communication plan to existing Subscribers to remind them of the details of the Subscriber Agreement, which is industry standard and applies to all CAs, in requiring timely revocation, and providing resources to help those customers prepare. These are just two examples that, from the limited details provided, seem to apply, but I expect that as the questions Wayne highlighted about the risk analysis being performed, it may be that others are identified as well. And that's what the incident process serves. On Thu, Dec 20, 2018 at 12:55 AM Jeremy Rowley <[email protected]> wrote: > Done: > > > > https://bugzilla.mozilla.org/show_bug.cgi?id=1515564 > > > > It ended up being about 1200 certs total that we are hearing can’t be > replaced because of blackout periods. > _______________________________________________ dev-security-policy mailing list [email protected] https://lists.mozilla.org/listinfo/dev-security-policy

