RE: Underscore characters

Jeremy Rowley via dev-security-policy Thu, 27 Dec 2018 19:02:01 -0800

The risk Matt identified is too nebulous of an issue to address, tbh. How do 
you address a moral issue?  The only way I can think of to address the moral 
issue is to say “we promise to be good”. But the weight that carries depends on 
how much you trust the actor. If you trust the actor, then the moral issue is 
addressed. If you don’t trust the actor, moral issue is not addressed. If you 
or Matt can identify a specific threat you’d like me to address about the moral 
issue, I’ll do my best to respond.

*       What happens is that you ask why there is risk of outage to begin with 
and what can be done to improve going forward? Let’s assume you do revoke, and 
it causes an outage - is DigiCert taking steps to ensure no customer of theirs 
is ever faced with that risk? If so, what are those steps?

Yeah – there are several things we can do to improve going forward:

1.      Communicate better with the customers. The first mistake was waiting 
until we had good data to communicate with the customers. This delayed 
notification. This was unknown to me at the time, or we would have sent out 
communication prior to the ballot passing. That instruction has been passed 
along (no waiting on these critical issues) plus training.
2.      No more skipping CAB Forum meetings for me. This was easily a 
foreseeable issue because we knew people couldn’t replace in January. I think 
it’s been brought up a half dozen times in the forum at least. I’m not sure why 
we didn’t communicate this in Shanghai. But, the real problem is I didn’t have 
direct knowledge of what was going on. I probably need to be there in person 
each time so we can align the company correctly with that is going on.

I don’t think we can ever take steps to ensure that no customer is ever faced 
with the risk of revoked certs. I’m sure there will be other items that are 
adopted we don’t foresee.  That said, we do promote automation, short-lived 
certs (you can get anything from about 8 hours up through our system), and CT 
logging. I think the biggest surprise on this one was it applied to certs that 
are no longer trusted by Mozilla or Google. 

> This seems to suggest that perhaps other CAs have prepared their customers 
> for revocation. How does this surprise - that no other CA faces this - lead 
> to tangible changes in the business processes? How would this change, if 
> another CA did have the same issue? Surely you can see there are real and 
> fundamental issues that you’re uniquely qualified to help your customers 
> address in ways that we cannot. 

I suppose they did prepare better. Maybe other CAs are just smarter than me? I 
won’t leave that off the table.  I agree that we are uniquely positioned to 
help our customers remediate. Definitely anxious to do that (and are doing so). 

*       Have you analyzed CT, for example, to see why DigiCert is unique? 
Certainly, by sheer volume, it's heavily tilted towards the old Symantec 
infrastructure - and the customers that came over to DigiCert. With those sorts 
of details, how does this change how things were done, or how they will be done?

We do know most of the customers were legacy Symantec, but there are definitely 
some DigiCert customers in there. I think we still continue the same course. 
It’s only been a year from the transition, and we’ve migrated nearly everyone 
off the Symantec infrastructure. Next comes shutting down all the legacy 
Symantec systems. 

*       I’m not trying to pick on y’all - I think it is legitimately good that 
you provided concrete data. Even if you do revoke on Jan 15, this is still 
useful to understand the challenges, but only if this leads to meaningful 
changes. What might those look like?

I appreciate that. I think these are all fair questions, and I’m trying my best 
to answer them. I especially don’t feel picked on since we’re requesting the 
information/decision on what to do.

I don’t know how to answer the question of what changes to make because I was a 
bit blindsided by the decision to revoke the certs. Probably shouldn’t have 
been considering the conversation at the CAB Forum.  My number one priority 
right now is to shut down all of the legacy Symantec systems. Last year was 
mostly migration of issuance and trying to get the systems up to an expected 
caliber of performance. At the same time we’re introducing industry-standard 
(and above) automation of issuance and deployment systems that we hope will 
help people replace certificates faster. 

*       And this is the framing that I think is incredibly helpful. 
Understanding why customers can’t change, and what steps are being done to 
ensure they can, is hugely useful. Wayne’s question were to this point - as 
were mine towards understanding the problem from the other side, which are 
steps the CA is taking. As I've repeatedly highlighted from 
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , the goal is 
not punishment - but understanding how these issues are being addressed. 

The main blocker for all of these is policy, not technology. I don’t know how 
to solve third party policy decisions, which is why I can’t seem to answer the 
questions. The process of planning a change, getting sign-off, rolling the 
change to stage, getting more sign-off, and then rolling to production with 
final testing combined with the blackout periods is making something that 
should be easy very difficult. I run an agile team at DigiCert so none of these 
are concerns when we roll a change internally. It’s the revocation part that is 
getting people up in arms. The consistent message I’ve gotten from customers is 
that changing domains and certificates requires the same process. It’s just as 
fast to roll out a change to both items as change just a certificate. The 
built-in CAB Forum 30 day cert requirement isn’t solving the issue because of 
the way they roll changes, not because the 30 day certs aren’t available. 

*       This seems like a significant improvement from “100% of customers can’t”

Definitely an improvement. I’m hoping to get to 100% by the time we hit Jan 
15th. The four I posted (and one more I got more info from today) probably 
won’t. Even within those customers, we’re asking them identify specifically 
which certificates cannot be replaced in time.

*       I mean, it’s two-fold, right? Any incident can lead to total distrust, 
but it’s also unlikely that a single incident leads to total distrust. The way 
to balance those competing statements is to do what you’re doing - and to be 
transparent. As Matt has highlighted, there’s a huge risk here that this leads 
to a moral hazard - and the best way to mitigate that is to discuss steps being 
taken to reduce that risk going forward, particularly about what a core part of 
the problem statement is - difficulty in revocation.

This isn’t our first incident sadly ☹. It probably won’t be our last. The 
transition from Symantec to DigiCert was….rough.

*       In a number of ways, an unintentional violation is worse than an 
intentional violation. Ignorance is not really an excuse when you hold keys to 
the Internet, and being asleep at the wheel is hugely dangerous. So, if I had 
to pick between an intentional violation and an unintentional (and preventable) 
violation, I'd likely pick intentional. But there's also a huge hazard with 
intentional violations - those reveal potentially systemic issues and a lack of 
good faith, especially if they become common-place. We definitely saw CAs 
perform intentional violations and notify after-the-fact, and that's far, far 
worse than those that notify before intentionally violating (I think every 
post-facto notification for intentional incident has, eventually, lead to that 
CAs distrust).

Totally agree. I really don’t want to violate the BRs, and this shouldn’t be 
the norm. I also recognize we don’t want to invite this question for every BR 
change. Maybe better Mozilla guidelines about what’s acceptable requests and 
what’s not? 

*       So somewhere on the scale of things, we're in a better place than most 
every alternative. But to ensure this is in that 'good faith' side of things, 
understanding what the factors are that have been evaluated, and what steps are 
being taken to prevent this, are significant. As I said, I think the principles 
captured in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation 
and in the discussion about how at least some of us see this (that it's related 
to underscores incident response) suggests that it's not, in fact, the end of 
the world, or the CA, provided that meaningful data behind the decision to not 
revoke is given, meaningful plans and timelines for resolution are given, and 
meaningful steps to prevent this from ever happening again are given. It 
becomes an incident report, and the result is not a stern lecture - but 
concrete and quantifiable steps as to how to improve.

Thanks Ryan. This post was really nice. Appreciate it.

From: Ryan Sleevi <[email protected]> 
Sent: Thursday, December 27, 2018 7:15 PM
To: Jeremy Rowley <[email protected]>
Cc: James Burton <[email protected]>; Ryan Sleevi <[email protected]>; 
mozilla-dev-security-policy <[email protected]>
Subject: Re: Underscore characters

On Thu, Dec 27, 2018 at 6:56 PM Jeremy Rowley <[email protected] 
<mailto:[email protected]> > wrote:

The risk is primarily outages of major sites across the web, including certs 
used in Google wallet. We’re thinking that is a less than desirable result, but 
we weren’t sure how the Mozilla community would feel/react. 

I don’t think that is a particularly helpful framing, to be honest. The risk 
these organizations face here is self-inflicted; regardless of the feeling of 
underscores, there is unquestionably an issue for organizations that cannot 
respond in the BR timeframes, let alone extended ones that extend for months. 
That's a real ecosystem issue, and regardless of the CA these customers partner 
with, an issue that needs both better understanding and, to be honest, better 
prevention.

Matt has spoken at length to the risk to the community, which doesn’t really 
seem like it’s been acknowledged, let alone proposed as to how it will be 
mitigated. I have to ask again - what steps is DigiCert taking to avoid these 
issues going forward?

 We’re still considering revoking all of the certs on Jan 15th based on these 
discussions.  I don’t think we’re asking for leniency (maybe we are if that’s a 
factor?), but I don’t know what happens if you’re faced with causing outages 
vs. compliance.

What happens is that you ask why there is risk of outage to begin with and what 
can be done to improve going forward? Let’s assume you do revoke, and it causes 
an outage - is DigiCert taking steps to ensure no customer of theirs is ever 
faced with that risk? If so, what are those steps?

I started the conversation because I feel like we should be good netizans and 
make people aware of what’s going on instead of just following policy.  I’m 
actually surprised at least one other CA that has issued a large number of 
underscore character certs hasn’t run into the same timing issues.

This seems to suggest that perhaps other CAs have prepared their customers for 
revocation. How does this surprise - that no other CA faces this - lead to 
tangible changes in the business processes? How would this change, if another 
CA did have the same issue? Surely you can see there are real and fundamental 
issues that you’re uniquely qualified to help your customers address in ways 
that we cannot. 

Have you analyzed CT, for example, to see why DigiCert is unique? Certainly, by 
sheer volume, it's heavily tilted towards the old Symantec infrastructure - and 
the customers that came over to DigiCert. With those sorts of details, how does 
this change how things were done, or how they will be done?

I’m not trying to pick on y’all - I think it is legitimately good that you 
provided concrete data. Even if you do revoke on Jan 15, this is still useful 
to understand the challenges, but only if this leads to meaningful changes. 
What might those look like?

Normally, we would just revoke the certs, but there are a significant number of 
certs in the Alexa top 100. We’ve told most customers, “No exception”. I also 
thought it’s better to get the information out there so we can all make 
rational decisions (DigiCert included) if as many facts are known as possible.  

And this is the framing that I think is incredibly helpful. Understanding why 
customers can’t change, and what steps are being done to ensure they can, is 
hugely useful. Wayne’s question were to this point - as were mine towards 
understanding the problem from the other side, which are steps the CA is 
taking. As I've repeatedly highlighted from 
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , the goal is 
not punishment - but understanding how these issues are being addressed. 

We are working with the partners to get the certs revoked before the deadline. 
Most will. 

This seems like a significant improvement from “100% of customers can’t”

By January 15th, I hope there won’t be too many certs left. Unfortunately, by 
then it’s also too late to discuss what happens if the cert is not revoked. Ie 
– what are the benefits of revoking (strict compliance) vs revoking the larger 
impact certs as they are migrated (incident report).  Unfortunately part 2, 
there’s no guidance on whether an incident report means total distrust v. 
something on your audit and a stern lecture. 

I mean, it’s two-fold, right? Any incident can lead to total distrust, but it’s 
also unlikely that a single incident leads to total distrust. The way to 
balance those competing statements is to do what you’re doing - and to be 
transparent. As Matt has highlighted, there’s a huge risk here that this leads 
to a moral hazard - and the best way to mitigate that is to discuss steps being 
taken to reduce that risk going forward, particularly about what a core part of 
the problem statement is - difficulty in revocation.

I’d happily suffer a lecture than take down a top site. Not so willing to 
gamble the whole company. This is why we wanted to have the discussion now, 
despite no violation so far. The response from the browsers is public  - that 
they cannot make that determination. Does that mean we have our answer? Revoke 
is the only acceptable response?   

I mean, the answer has been to repeatedly highlight 
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

In a number of ways, an unintentional violation is worse than an intentional 
violation. Ignorance is not really an excuse when you hold keys to the 
Internet, and being asleep at the wheel is hugely dangerous. So, if I had to 
pick between an intentional violation and an unintentional (and preventable) 
violation, I'd likely pick intentional. But there's also a huge hazard with 
intentional violations - those reveal potentially systemic issues and a lack of 
good faith, especially if they become common-place. We definitely saw CAs 
perform intentional violations and notify after-the-fact, and that's far, far 
worse than those that notify before intentionally violating (I think every 
post-facto notification for intentional incident has, eventually, lead to that 
CAs distrust).

So somewhere on the scale of things, we're in a better place than most every 
alternative. But to ensure this is in that 'good faith' side of things, 
understanding what the factors are that have been evaluated, and what steps are 
being taken to prevent this, are significant. As I said, I think the principles 
captured in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation 
and in the discussion about how at least some of us see this (that it's related 
to underscores incident response) suggests that it's not, in fact, the end of 
the world, or the CA, provided that meaningful data behind the decision to not 
revoke is given, meaningful plans and timelines for resolution are given, and 
meaningful steps to prevent this from ever happening again are given. It 
becomes an incident report, and the result is not a stern lecture - but 
concrete and quantifiable steps as to how to improve.

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

RE: Underscore characters

Reply via email to