On Thu, Dec 27, 2018 at 6:56 PM Jeremy Rowley <jeremy.row...@digicert.com> wrote:
> The risk is primarily outages of major sites across the web, including > certs used in Google wallet. We’re thinking that is a less than desirable > result, but we weren’t sure how the Mozilla community would feel/react. > I don’t think that is a particularly helpful framing, to be honest. The risk these organizations face here is self-inflicted; regardless of the feeling of underscores, there is unquestionably an issue for organizations that cannot respond in the BR timeframes, let alone extended ones that extend for months. That's a real ecosystem issue, and regardless of the CA these customers partner with, an issue that needs both better understanding and, to be honest, better prevention. Matt has spoken at length to the risk to the community, which doesn’t really seem like it’s been acknowledged, let alone proposed as to how it will be mitigated. I have to ask again - what steps is DigiCert taking to avoid these issues going forward? We’re still considering revoking all of the certs on Jan 15th based on > these discussions. I don’t think we’re asking for leniency (maybe we are > if that’s a factor?), but I don’t know what happens if you’re faced with > causing outages vs. compliance. > What happens is that you ask why there is risk of outage to begin with and what can be done to improve going forward? Let’s assume you do revoke, and it causes an outage - is DigiCert taking steps to ensure no customer of theirs is ever faced with that risk? If so, what are those steps? I started the conversation because I feel like we should be good netizans > and make people aware of what’s going on instead of just following policy. > I’m actually surprised at least one other CA that has issued a large number > of underscore character certs hasn’t run into the same timing issues. > This seems to suggest that perhaps other CAs have prepared their customers for revocation. How does this surprise - that no other CA faces this - lead to tangible changes in the business processes? How would this change, if another CA did have the same issue? Surely you can see there are real and fundamental issues that you’re uniquely qualified to help your customers address in ways that we cannot. Have you analyzed CT, for example, to see why DigiCert is unique? Certainly, by sheer volume, it's heavily tilted towards the old Symantec infrastructure - and the customers that came over to DigiCert. With those sorts of details, how does this change how things were done, or how they will be done? I’m not trying to pick on y’all - I think it is legitimately good that you provided concrete data. Even if you do revoke on Jan 15, this is still useful to understand the challenges, but only if this leads to meaningful changes. What might those look like? Normally, we would just revoke the certs, but there are a significant > number of certs in the Alexa top 100. We’ve told most customers, “No > exception”. I also thought it’s better to get the information out there so > we can all make rational decisions (DigiCert included) if as many facts are > known as possible. > And this is the framing that I think is incredibly helpful. Understanding why customers can’t change, and what steps are being done to ensure they can, is hugely useful. Wayne’s question were to this point - as were mine towards understanding the problem from the other side, which are steps the CA is taking. As I've repeatedly highlighted from https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , the goal is not punishment - but understanding how these issues are being addressed. > > We are working with the partners to get the certs revoked before the > deadline. Most will. > This seems like a significant improvement from “100% of customers can’t” By January 15th, I hope there won’t be too many certs left. Unfortunately, > by then it’s also too late to discuss what happens if the cert is not > revoked. Ie – what are the benefits of revoking (strict compliance) vs > revoking the larger impact certs as they are migrated (incident report). > Unfortunately part 2, there’s no guidance on whether an incident report > means total distrust v. something on your audit and a stern lecture. > I mean, it’s two-fold, right? Any incident can lead to total distrust, but it’s also unlikely that a single incident leads to total distrust. The way to balance those competing statements is to do what you’re doing - and to be transparent. As Matt has highlighted, there’s a huge risk here that this leads to a moral hazard - and the best way to mitigate that is to discuss steps being taken to reduce that risk going forward, particularly about what a core part of the problem statement is - difficulty in revocation. I’d happily suffer a lecture than take down a top site. Not so willing to > gamble the whole company. This is why we wanted to have the discussion now, > despite no violation so far. The response from the browsers is public - > that they cannot make that determination. Does that mean we have our > answer? Revoke is the only acceptable response? > I mean, the answer has been to repeatedly highlight https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation In a number of ways, an unintentional violation is worse than an intentional violation. Ignorance is not really an excuse when you hold keys to the Internet, and being asleep at the wheel is hugely dangerous. So, if I had to pick between an intentional violation and an unintentional (and preventable) violation, I'd likely pick intentional. But there's also a huge hazard with intentional violations - those reveal potentially systemic issues and a lack of good faith, especially if they become common-place. We definitely saw CAs perform intentional violations and notify after-the-fact, and that's far, far worse than those that notify before intentionally violating (I think every post-facto notification for intentional incident has, eventually, lead to that CAs distrust). So somewhere on the scale of things, we're in a better place than most every alternative. But to ensure this is in that 'good faith' side of things, understanding what the factors are that have been evaluated, and what steps are being taken to prevent this, are significant. As I said, I think the principles captured in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation and in the discussion about how at least some of us see this (that it's related to underscores incident response) suggests that it's not, in fact, the end of the world, or the CA, provided that meaningful data behind the decision to not revoke is given, meaningful plans and timelines for resolution are given, and meaningful steps to prevent this from ever happening again are given. It becomes an incident report, and the result is not a stern lecture - but concrete and quantifiable steps as to how to improve. _______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy