CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

Dimitris Zacharopoulos via dev-security-policy Thu, 29 Nov 2018 13:04:15 -0800

I didn't want to hijack the thread so here's a new one.


On 29/11/2018 6:39 μ.μ., Ryan Sleevi wrote:

On Thu, Nov 29, 2018 at 2:16 AM Dimitris Zacharopoulos<[email protected] <mailto:[email protected]>> wrote:
    Mandating that CAs disclose revocation situations that exceed the
    5-day
    requirement with some risk analysis information, might be a good
    place
to start.
This was proposed several times by Google in the Forum, andconsistently rejected, unfortunately.

Times and circumstances change. When I brought this up at the ServerCertificate Working Group of the CA/B Forum(https://cabforum.org/pipermail/servercert-wg/2018-September/000165.html),there was no open disagreement from CAs. However, think about CAs thatdecide to extend the 5-days (at their own risk) because of extenuatingcircumstances. Doesn't this community want to know what thesecircumstances are and evaluate the gravity (or not) of the situation?The only way this could happen in a consistent way among CAs would be torequire it in some kind of policy.

This list has seen disclosures of revocation cases from CAs, mainly aspart of incident reports. What I understand as disclosure is the factthat CAs shared that certain Subscribers (we know these subscribersbecause their Certificates were disclosed as part of the incidentreport) would be damaged if the mis-issued certificates were revokedwithin 24 hours. Now, depending on the circumstances this might beextended to 5 days.

    I don't consider 5 days (they are not even working days) to be
    adequate
    warning period to a large organization with slow reflexes and long
procedures.
Phrased differently: You don't think large organizations are currentlycapable, and believe the rest of the industry should accommodate that.


"Tolerate" would probably be the word I'd use instead of "accommodate".

Do you believe these organizations could respond within 5 days iftheir internet connectivity was lost?

I think there is different impact. Losing network connectivity wouldhave "real" and large (i.e. all RPs) impact compared to installing acertificate with -say- 65 characters in the OU field which may causevery few problems to some RPs that want to use a certain web site.

    For example, if many CAs violate the 5-day rule for revocations
    related
    to improper subject information encoding, out of range, wrong
    syntax and
    that sort, Mozilla or the BRs might decide to have a separate
    category
    with a different time frame and/or different actions.
Given the security risks in this, I think this is extremely harmful tothe ecosystem and to users.
    It is not the first time we talk about this and it might be worth
    exploring further.
I don't think any of the facts have changed. We've discussed forseveral years that CAs have the opportunity to provide thisinformation, and haven't, so I don't think it's at all proper tosuggest starting a conversation without structured data. CAs that arepassionate about this could have supported such efforts in the Forumto provide this information, or could have demonstrated doing so ontheir own. I don't think it would at all be productive to discussthese situations in abstract hypotheticals, as some of the discussionshere try to do - without data, that would be an extremely unproductiveuse of time.

There were voices during the SC6 ballot discussion that wanted to extendthe 5 days to something more. We continuously see CAs that either detector learn about having mis-issued Certificates, that fail to revokewithin 24 hours or even 5 days because their Subscribers have problemsand the RPs would be left with no service until the certificates werereplaces. I don't think we are having a hypothetical discussion, we haveseen real cases being disclosed in m.d.s.p. but it would be important tohave a policy in place to require disclosure of more information.Perhaps that would work as a deterrent for CAs to revoke past the 5 daysif they don't have strong arguments to support their decisions in public.

    As a general comment, IMHO when we talk about RP risk when a CA
    issues a
    Certificate with -say- longer than 64 characters in an OU field, that
    would only pose risk to Relying Parties *that want to interact
    with that
particular Subscriber*, not the entire Internet.
No. This is demonstrably and factually wrong.
First, we already know that technical errors are a strong sign thatthe policies and practices themselves are not being followed - boththe validation activities and the issuance activities result from theCA following it's practices and procedures. If a CA is not followingits practices and procedures, that's a security risk to the Internet,full stop.

You describe it as a black/white issue. I understand your argument thatother control areas will likely have issues but it always comes down towhat impact and what damage these failed controls can produce. Layeredcontrols and compensating controls in critical areas usually lower therisk of severe impact. The Internet is probably safe and will not breakif for example a certificate with 65-character OU is used on a publicweb site. It's not the same as a CA issuing SHA1 Certificates withcollision risk.

Second, it presumes (incorrectly) that interoperability is notsomething valuable. That is, if say the three existing, most popularimplementations all do not check whether or not it's longer than 64characters (for example), and a fourth implementation would like tocome along, they cannot read the relevant standards and implementsomething interoperable. This is because 'interoperability' is beingredefined as 'ignoring' the standard - which defeats the purposes ofstandards to begin with. These choices - to permit deviations -creates risks for the entire ecosystem, because there's no longerinteroperability. This is equally captured inhttps://tools.ietf.org/html/draft-iab-protocol-maintenance-01
The premise to all of this is that "CAs shouldn't have to followrules, browsers should just enforce them," which is shocking andunfortunate. It's like saying "It's OK to lie about whatever you want,as long as you don't get caught" - no, that line of thinking is justas problematic for morality as it is for technical interoperability.CAs that routinely violate the standards create risk, because theyhave full trust on the Internet. If the argument is that the CA'sactions (of accidentally or deliberately introducing risk) is theproblem, but that we shouldn't worry about correcting the individualcertificate, that entirely misses the point that without correctingthe certificate, there's zero incentive to actually follow thestandards, and as a result, that creates risk for everyone.Revocation, if you will, is the "less worse" alternative to completedistrust - it only affects that single certificate, rather than everyone of the certificates the CA has issued. The alternative - notrevoking - simply says that it's better to look at distrust options,and that's more risk for everyone.

I absolutely agree that interoperability is something valuable thatshould be pursued by the ecosystem. Browsers and the majority of CAswork in that direction. It's just the fact that if a browser strictlyenforces a requirement from a standard (e.g. rejects a certificate thathas an OU field with more than 64 characters), it makes a hugedifference towards the goal for interoperability compared to a CA thatjust issues certificate with max of 64 characters in the OU. If browsersenforced these rules, the difference would be so big that theproblematic certificate would be immediately discovered by theSubscriber, who would complain to the CA and the Certificate would mostlikely be revoked immediately since it wouldn't be usable.

What I meant to say in my original argument is that the "damage" createdby a certificate that fails to strictly comply with RFC5280 and the restof the X.* standards, as long as popular browsers "allow it", isprimarily an issue between a Subscriber (that maintains a web site), andthe particular Relying Parties that want to establish a secureconnection to that web site. That's not the entire Internet. This is whyI compared it with "a situation where a site operator forgets to sendthe intermediate CA Certificate in the chain. These particular RPs willfail to get TLS working when they visit the Subscriber's web site".

Perhaps I have misunderstood your argument but when we are discussingabout revocation timelines, it looks a little extreme to say that a CAclaiming "some important reasons" (I'm not saying if they are validreasons or not) for delaying a certificate revocation, that they havezero incentive to follow the standards.

Finally, CAs are terrible at assessing the risk to RPs. For example,negative serial numbers were prolific prior to the linters, and thosehave issues in as much as they are, for some systems, irrevocable.This is because those systems implemented the standards correctly -serials are positive INTEGERs - yet had to account for the fact thatCAs are improperly encoding them, such as by "making" them positive(adding the leading zero). This leading zero then doesn't get strippedoff when looking up by Issuer & Serial Number, because they're usingthe "spec-correct" serial rather than the "issuer-broken" serial.That's an example where the certificate "works", no report is filed,but the security and ecosystem properties are fatally compromised. Thealternatives for such implementation are:1) Reject such certificates (but see above about market forces andinteroperability)2) Correct both the certificate and the CRL/OCSP serial number (whichthen creates risk because you're not actually checking _any_certificates true serial)3) Allow negative serial numbers (which then makes it harder forothers to do #1)
As I said, CAs have been terrible at assessing risk to the ecosystemfor their decisions. The page athttps://wiki.mozilla.org/SecurityEngineering/mozpkix-testing#Things_for_CAs_to_Fixshows how bad such interoperability harms improvements - for example,all of these hacks that Mozilla had to add in order to ship a moresecure, more efficient certificate verifier.

As I said earlier, times change. The bar is raised, this industrymatures day-after-day, things are hopefully improving (security-wise).There is certainly more security awareness today for this ecosystem thanit was 5 or 10 years ago. Specifically for these "past sins", we haveseen browsers using telemetry to see how many certificates fail tofollow specific requirements and should normally see these numbersdecrease over time. Once these numbers reach an acceptably low level, weusually see code changes that enforce these requirements and remove the"hacks". Of course, this is a different topic for discussion.

In conclusion, after repeatedly seeing CAs requesting or effectivelytaking more time to revoke certificates that the existing requirements,I believe that a policy rule that would require CAs to discloserevocation cases requiring more than 5 days to complete (i.e. revoke thecertificate), provided that the CA submits risk analysis informationafter working with the affected Subscriber(s), is a reasonable way forward.



Dimitris.


_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

CA disclosure of revocations that exceed 5 days [Was: Re: Incident report D-TRUST: syntax error in one tls certificate]

Reply via email to