Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

Ryan Sleevi via dev-security-policy Thu, 29 Nov 2018 15:50:23 -0800

On Thu, Nov 29, 2018 at 4:03 PM Dimitris Zacharopoulos via
dev-security-policy <[email protected]> wrote:


> I didn't want to hijack the thread so here's a new one.
>
>
> Times and circumstances change.


You have to demonstrate that.

When I brought this up at the Server
> Certificate Working Group of the CA/B Forum
> (https://cabforum.org/pipermail/servercert-wg/2018-September/000165.html),
>
> there was no open disagreement from CAs.


Look at the discussion during Wayne’s ballot. Look at the discussion back
when it was Jeremy’s ballot. The proposal was as simplified as could be -
modeled after 9.16.3 of the BRs. It would have allowed for a longer period
- NOT an unbounded period, which is grossly negligent for publicly trusted
CAs.

However, think about CAs that
> decide to extend the 5-days (at their own risk) because of extenuating
> circumstances. Doesn't this community want to know what these
> circumstances are and evaluate the gravity (or not) of the situation?
> The only way this could happen in a consistent way among CAs would be to
> require it in some kind of policy.


This already happens. This is a matter of the CA violating any contracts or
policies of the root store it is in, and is already being handled by those
root stores - e.g. misissuance reports. What you’re describing as a problem
is already solved, as are the expectations for CAs - that violating
requirements is a path to distrust.

The only “problem” you’re solving is giving CAs more time, and there is
zero demonstrable evidence, to date, about that being necessary or good -
and rich and ample evidence of it being bad.

> Phrased differently: You don't think large organizations are currently
> > capable, and believe the rest of the industry should accommodate that.
>
> "Tolerate" would probably be the word I'd use instead of "accommodate".


I chose accommodate, because you’d like the entire world to take on
systemic risk - and it is indeed systemic risk, to users especially - to
benefit some large companies.

Why stop with revocation, though? Why not just let CAs define their own
validation methods of they think they’re equivalent? After all, if we can
trust CAs to make good judgements on revocation, why can’t we also trust
them with validation? Some large companies struggle with our existing
validation methods, why can’t we accommodate them?

That’s exactly what one of the arguments against restricting validation
methods was.

As I said, I think this discussion will not accomplish anything productive
without a structured analysis of the data. Not anecdata from one or two
incidents, but holistic - because for every 1 real need, there may have
been 9,999 unnecessary delays in revocation with real risk.

How do CAs provide this? For *all* revocations, provide meaningful data. I
do not see there being any value to discussing further extensions until we
have systemic transparency in place, and I do not see any good coming from
trying to change at the same time as placing that systemic transparency in
place, because there’s no way to measure the (negative) impact such change
would have.

>
> > Do you believe these organizations could respond within 5 days if
> > their internet connectivity was lost?
>
> I think there is different impact. Losing network connectivity would
> have "real" and large (i.e. all RPs) impact compared to installing a

certificate with -say- 65 characters in the OU field which may cause
> very few problems to some RPs that want to use a certain web site.


So you do believe organizations are capable of making timely changes when
necessary, and thus we aren’t discussing capabilities, but perceived
necessity. And because some organizations have been mislead as to the role
of CAs, and thus don’t feel its necessary, don’t feel they should have to
use that capability.

I’m not terribly sympathetic to that at all. As you mention, they can
respond when all RPs are affected, so they can respond when their
certificate is misissused and thus revoked.

You describe it as a black/white issue. I understand your argument that
> other control areas will likely have issues but it always comes down to
> what impact and what damage these failed controls can produce. Layered
> controls and compensating controls in critical areas usually lower the
> risk of severe impact. The Internet is probably safe and will not break
> if for example a certificate with 65-character OU is used on a public
> web site. It's not the same as a CA issuing SHA1 Certificates with
> collision risk.


It absolutely is, and we have seen this time and time again. The CAs most
likely to argue the position you’re taking are the CAs that have had the
most issues.

Do we agree, at least, that any CA violating the BRs or Root Policies puts
the Internet ecosystem at risk?

It seems the core of your argument is how much risk should be acceptable,
and the answer is none. Zero. The point of postmortems is to get us to a
point where, as an industry, we’ve taken every available step to reduce and
eliminate that risk, by learning from our collective mistakes. Lives and
businesses are on the line - a single mistake can cost billions - and
there’s no excuse for just shrugging and saying “well, yanno, there’s risk
and there’s risk”

Go read
https://zakird.com/papers/zlint.pdf to see a systemic, thorough, analysis
that supports what I described to you, and disagrees with your framing. We
know what the warning signs are - and it’s continued framing of “low” risk
that collectively presents “severe” risk.


>
> >
> > Second, it presumes (incorrectly) that interoperability is not
> > something valuable. That is, if say the three existing, most popular
> > implementations all do not check whether or not it's longer than 64
> > characters (for example), and a fourth implementation would like to
> > come along, they cannot read the relevant standards and implement
> > something interoperable. This is because 'interoperability' is being
> > redefined as 'ignoring' the standard - which defeats the purposes of
> > standards to begin with. These choices - to permit deviations -
> > creates risks for the entire ecosystem, because there's no longer
> > interoperability. This is equally captured in
> > https://tools.ietf.org/html/draft-iab-protocol-maintenance-01
> >
> > The premise to all of this is that "CAs shouldn't have to follow
> > rules, browsers should just enforce them," which is shocking and
> > unfortunate. It's like saying "It's OK to lie about whatever you want,
> > as long as you don't get caught" - no, that line of thinking is just
> > as problematic for morality as it is for technical interoperability.
> > CAs that routinely violate the standards create risk, because they
> > have full trust on the Internet. If the argument is that the CA's
> > actions (of accidentally or deliberately introducing risk) is the
> > problem, but that we shouldn't worry about correcting the individual
> > certificate, that entirely misses the point that without correcting
> > the certificate, there's zero incentive to actually follow the
> > standards, and as a result, that creates risk for everyone.
> > Revocation, if you will, is the "less worse" alternative to complete
> > distrust - it only affects that single certificate, rather than every
> > one of the certificates the CA has issued. The alternative - not
> > revoking - simply says that it's better to look at distrust options,
> > and that's more risk for everyone.
> >
>
> I absolutely agree that interoperability is something valuable that
> should be pursued by the ecosystem. Browsers and the majority of CAs
> work in that direction. It's just the fact that if a browser strictly
> enforces a requirement from a standard (e.g. rejects a certificate that
> has an OU field with more than 64 characters), it makes a huge
> difference towards the goal for interoperability compared to a CA that
> just issues certificate with max of 64 characters in the OU. If browsers
> enforced these rules, the difference would be so big that the
> problematic certificate would be immediately discovered by the
> Subscriber, who would complain to the CA and the Certificate would most
> likely be revoked immediately since it wouldn't be usable.


I literally provided you an explanation for why what you’re describing is
problematic and unreasonable. Please do re-read it. In a new system, sure,
that’s be great - but the existing system absolutely penalizes first movers.

Look at SC12 as an example. CAs would really like browsers to make that
change, because then they can have their customers blame browsers for their
misissuance. The customer is not going to say “Guess I should replace my
cert”, but rather, blame the browser. The links I provided showed how CAs
widespread disregard for the standards created real compatibility and
security issues - and a browser just rejecting them doesn’t actually fix
it, because the site says “well, works in other browsers, so the bug must
be the browsers, not mine.”

What I meant to say in my original argument is that the "damage" created
> by a certificate that fails to strictly comply with RFC5280 and the rest
> of the X.* standards, as long as popular browsers "allow it", is
> primarily an issue between a Subscriber (that maintains a web site), and
> the particular Relying Parties that want to establish a secure
> connection to that web site. That's not the entire Internet. This is why
> I compared it with "a situation where a site operator forgets to send
> the intermediate CA Certificate in the chain. These particular RPs will
> fail to get TLS working when they visit the Subscriber's web site".


It’s a perfect example of why your argument DOESN’T work. As Mozilla has
shared in the CA/B Forum, people don’t fix their site - they blame the
browser, and keep on with the brokenness. Firefox is the one having to
change to “accommodate” that.

Perhaps I have misunderstood your argument but when we are discussing
> about revocation timelines, it looks a little extreme to say that a CA
> claiming "some important reasons" (I'm not saying if they are valid
> reasons or not) for delaying a certificate revocation, that they have
> zero incentive to follow the standards.


It isn’t extreme, because even the incident reports from 2014/2015 show
exactly this argument being made. Your arguments themselves continue to
show that, by suggesting that “only” the site is impacted. And yet, if
every site is doing it because “only” that site is impacted, you have the
whole ecosystem doing it.

This myopic view of trying to assess per-Certificate is inherently
non-scalable. You haven’t actually proposed any way to address that. What
happens when a CA is doing 100 “exceptional” non-revocations? What about
10,000? We’ve seen examples of both discussed - so nothing is new here. Do
we make CAs also pay penalty fees, so that the community can ensure there
is adequate staffing to investigate and review this? If we do that, what’s
to prevent CAs from just seeing that as buying indulgences?

Your whole proposal breaks down at scale. It’s like asking “What’s the harm
if I start stealing candy bars - after all, it’s only a candy bar?” -
without actually acknowledging the consequences of normalizing that
behavior. It tries to frame the conversation as being about a $1 candy,
which, while appealing, isn’t actually what is being discussed.

Maybe you’re blinded by optimism and faith in CAs. I think if you take a
more realistic, grounded, and holistic view of the ecosystem - one that
considers we were where you propose to go 8 years ago (and it was
disastrous for the ecosystem), one that considers this is a shared commons,
and one that acknowledges the misaligned incentives - you would realize we
already know how and why this sort of suggestion doesn’t actually work in
practice, because we have been there, done that.

> Finally, CAs are terrible at assessing the risk to RPs. For example,
> > negative serial numbers were prolific prior to the linters, and those
> > have issues in as much as they are, for some systems, irrevocable.
> > This is because those systems implemented the standards correctly -
> > serials are positive INTEGERs - yet had to account for the fact that
> > CAs are improperly encoding them, such as by "making" them positive
> > (adding the leading zero). This leading zero then doesn't get stripped
> > off when looking up by Issuer & Serial Number, because they're using
> > the "spec-correct" serial rather than the "issuer-broken" serial.
> > That's an example where the certificate "works", no report is filed,
> > but the security and ecosystem properties are fatally compromised. The
> > alternatives for such implementation are:
> > 1) Reject such certificates (but see above about market forces and
> > interoperability)
> > 2) Correct both the certificate and the CRL/OCSP serial number (which
> > then creates risk because you're not actually checking _any_
> > certificates true serial)
> > 3) Allow negative serial numbers (which then makes it harder for
> > others to do #1)
> >
> > As I said, CAs have been terrible at assessing risk to the ecosystem
> > for their decisions. The page at
> >
> https://wiki.mozilla.org/SecurityEngineering/mozpkix-testing#Things_for_CAs_to_Fix
> > shows how bad such interoperability harms improvements - for example,
> > all of these hacks that Mozilla had to add in order to ship a more
> > secure, more efficient certificate verifier.
>
> As I said earlier, times change. The bar is raised, this industry
> matures day-after-day, things are hopefully improving (security-wise).


You said that, without any systemic data, without any support. Having the
same conversation tomorrow that we had today because, hey, “times change”,
may even be true, but it isn’t productive in the least.

I disagree that we’ve seen systemic improvements as a whole. There are a
few CAs trying to do better, but the incident reporting of today clearly
shows exactly what I’m saying - that the industry has not actually matured
as you suggest. What has changed has largely been driven by those outside
CAs - whether those who were wanting to become CAs (Amazon with certlint)
or those analyzing CA’s failures  (ZLint).

In conclusion, after repeatedly seeing CAs requesting or effectively
> taking more time to revoke certificates that the existing requirements,
> I believe that a policy rule that would require CAs to disclose
> revocation cases requiring more than 5 days to complete (i.e. revoke the
> certificate), provided that the CA submits risk analysis information
> after working with the affected Subscriber(s), is a reasonable way forward.


I think it is grossly negligent and irresponsible, and is only reasonable
if one ignores the past two decades (such as by glibly saying “times
change”). A proposal based on submitting risk analysis merely outsources
the costs from the Subscriber onto this community and RPs in general - who
could easily become consumed with reading thousands upon thousands of
these. Such an act is incredibly hostile to meaningful trust in CAs and the
ecosystem.

Far more compelling is to reduce the timeframe that CAs can “go rogue” not
revoking, by reducing the overall certificate lifetime. By improving the
rate at which certificates are replaced, the “hardship” you spoke to
(though seemingly agree it’s not actually there) can be reduced. This can
be done without introducing the need for costly, and subjective, risk
assessments or “exceptions”.

In any event, I think it’s unproductive to try to bring this conversation
up without concrete data. If multiple CAs committed to publishing all of
their revocation data in a systemic way - reasons, hardships, etc (NOT just
the exceptional cases) - and committed to making funds available to be used
to rigorously analyze this (e.g. funding Mozilla to hire someone for this,
funding peer reviewed papers) - it might be worth revisiting. Then we could
have concrete data that could, for example, show that these “hardships” are
one-in-a-million (certs), and more reflective of poor organization controls
by CAs and Subscribers, rather than a systemic problem to address.
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

Re: CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

Reply via email to