Re: Feasibility of a binding commitment to revoke before issuance

Jesper Kristensen Thu, 25 Jul 2024 10:35:34 -0700

+1 to what Amir wrote.

Many CAs use the 5 day deadline to give the subscriber 5 days to replace
their certificates. I don't think that is what the 5 days are for. Some
incidents are obvious, and CAs should therefore be able to revoke in 24
hours, but others are less obvious. CAs may sometimes need time to
determine if the certificate was misissued or not, and if it is misissued,
find out how to fix the problem, deploy the fix, and find all the other
certificates that suffer from the same misissuance. I think this is why
they need to have 5 days to cover the edge cases.


Instead of giving CAs more time to revoke, maybe Mozilla could adopt some
of Chrome's Moving Forward, Together initiatives that push towards
supporting agility (and therefore fast certificate replacement) like max.
90 day end-entity certificates, lower validity periods for subordinate CAs,
and require CAs to support ACME and ARI.

As a concrete proposal (which I admit might not be fully thought through),
Mozilla could add a requirement that when a CA delays revocation because a
subscriber requested a delay, then every certificate that the CA issues for
the next two years that shares a SAN identifier with the delayed revocation
certificate must have a validity of at most 30 days. This would incentivise
subscribers who failed to design their systems for agility, while it should
not be a big burden for subscribers who just had a bad day because their
automation failed them once.

Some large cloud providers (e.g. Cloudflare) have backup certificates from
multiple CAs that they can use in case there is a problem with one CA
without having to reissue a new certificate first. Maybe we should promote
that as a best practice? I am not sure how many off-the-shelf ACME clients
have support for this.

Maybe we don't need new policies but more resources to review bad incident
reports, and sanctions other than distrust when CAs refuse to deliver. If I
remember correctly, in the last six months Digicert is the only CA who at
least attempted to comply with "the rationale must be provided on a
per-Subscriber basis."

Den ons. 24. jul. 2024 kl. 21.45 skrev 'Amir Omidi (aaomidi)' via
[email protected] <[email protected]>:

> Hey Ben,
>
> I think that the suggestion to increase the time frame for revocation from
> 5 to 20 days is dangerous. Here are a couple of issues I have with this:
>
> First: Security Impact Analysis is very difficult. It's arguably harder
> than root cause analysis. The majority of CAs (by count, not issuance) do
> an awful job at root cause analysis. I do not think they are (or, honestly
> speaking, will ever be) at the maturity level to do security impact
> analysis within 24 hours to determine if this is a 24-hour or 20-day
> revocation deadline.
>
> Second: We're effectively going to be left with very few situations that
> necessitate 24-hour revocations. This proposal:
>
>    1. Makes it harder to test out if mass revocations will actually work
>    when they're required.
>    2. Discourages entities from adopting Certificate Lifecycle Management
>    (CLM).
>    3. Makes it significantly more difficult to reduce certificate
>    lifetimes to a 90-day maximum in the future.
>    4. Sacrifices Web PKI security because of a handful of enterprise
>    companies that have the money, and talent to solve this problem internally,
>    but are choosing to invest in ${literally_anything_else} instead.
>
> Third: Holidays, weekends, etc. are not really relevant here either,
> because any of these incidents can become a 24-hour revocation incident 
> *anyway,
> *and if the 24-hour revocation incidents are not happening often enough,
> then CAs will not be ready to execute on a revocation like that. If this is
> too prohibitive for a CA to staff itself so it can handle revocation within
> 24 hours, they should consider not being a CA.
>
> Fourth: Root Program enforcement of the existing policies is weak. Mozilla
> & Apple & Microsoft still have not distrusted Entrust despite the clear
> negligence in their operations. So what happens if a CA doesn't revoke in
> 20 days? Or misses a 24-hour revocation requirement? Any sort of rule
> change here without significantly upping the enforcement is not okay imo.
>
> Fifth: We already have a way for CAs and Subscribers to avoid the need for
> revocation: *Short lived certificates.*
>
> Sixth: The distribution of who benefits and who is hurt by this change is
> interesting. For example, on the CA ans subscriber side:
>
>    1. Top CAs (in terms of issuance load), are either fully automated, or
>    have automation integrated with part of their product. Some of these CAs
>    also provide CLM solutions to avoid outages due to CA issues. So they're
>    not really going to benefit from this.
>    2. Majority of subscribers (in terms of numbers of certificates held)
>    have, or are planning to implement CLM into their products. So they don't
>    really get any benefit from this proposal either.
>
> The folks that really benefit from this change are:
>
>    - Boutique CAs that have barely adopted automation for their CA
>    issuances. (e.g. some small CAs, some government CAs, etc)
>    - A handful of enterprise subscribers that are not investing into CLM
>    and are relying on manual work for certificate replacement.
>
> The folks that hurt, quite a bit, from this change are the end users (many
> of which look up to Mozilla to protect them when many other RPs are not).
> This change would make the web less safe for everyone by giving more
> allowances for the *bad *CAs and Subscribers to continue their bad
> behavior.
>
> Anyway, this change encourages more hands-on and non-automated certificate
> lifecycle management. This would be a regression in the ecosystem.
>
> *Alternative Proposal*
>
> This is going to be pretty controversial too: I'd be in favor of removing
> the 5-day category altogether, and require a 24-hour revocation for all
> mis-issuances (probably as a step function, lowering the 120 hour time
> limit by 24 hours every 6 months or something until they're aligned?)
>
> My justification for this is the inverse of the stuff I mentioned above.
> In other words, it forces companies to adopt automation, removes ambiguity
> from the side of CAs, and generally propels the ecosystem forward. This
> also means that we get more assurances that when a Crowdstrike situation
> hits Web PKI, we actually can respond in a reasonable time frame. This
> proposal also significantly simplifies the communications CAs must have
> with their subscribers about why a certificate is being revoked.
>
> Amir
> On Wednesday, July 24, 2024 at 2:36:31 PM UTC-4 Ben Wilson wrote:
>
>> Dear Tim and Matt,
>>
>> Thank you both for your insightful comments and contributions to the
>> ongoing discussion regarding timely certificate revocation. Your
>> perspectives are invaluable as we strive to find balanced and effective
>> solutions to this problem.
>>
>> Tim, your proposal to identify problematic certificates in advance and
>> make this information transparent not only addresses the core issue of
>> preparedness, but also encourages organizations to improve their crypto
>> agility.
>>
>> Matt, your questions and alternative proposal for regular, randomized
>> revocation testing are equally thought-provoking. Regular testing would
>> ensure that processes are robust, and that organizations remain vigilant
>> about their revocation capabilities.
>>
>> Given the complexity and importance of this issue, I would like to keep
>> the discussion alive and invite additional comments from the Mozilla
>> community.
>>
>> Personally, I currently favor extending the timeframe for the revocation
>> of certificates that have no security impact, e.g. to 20 days (exact
>> language TBD – e.g. by adding a new subsection to section 4.9.1.1 of the 
>> Baseline
>> Requirements
>> <https://cabforum.org/working-groups/server/baseline-requirements/requirements/>
>> ). I understand that extending the timeframe from 5 days to 20 days for
>> some types of revocations might raise questions about the empirical basis
>> for my position, especially concerning our continued preparation for
>> 24-hour revocations when security compromises like we experienced with
>> Heartbleed happen, but here are some points to consider. My review of past
>> Bugzilla incidents shows that many delayed revocations are not related to
>> security issues, but to compliance details that do not pose immediate
>> security risks. We have also received consistent feedback from CAs and
>> subscribers that the 5-day window for these types of revocations is too
>> restrictive and does not reflect the operational realities of many
>> organizations. The current 5-day timeframe does not account for holidays,
>> weekends, and other operational delays. Extending the timeframe provides a
>> more realistic window for organizations to respond without compromising
>> their operational integrity. Some organizations face legal and regulatory
>> hurdles that make immediate revocation challenging, and extending the
>> timeframe can help them comply with both CA/B Forum requirements and local
>> laws. When adopting any security-related measure, such as revocation, a
>> cost-benefit-based risk analysis should be done. The analysis should
>> justify why a 5-day period is necessary when a 20-day period might be just
>> as effective without imposing undue burdens. Finally, extending the
>> timeframe for non-security-related revocations does not hinder preparation
>> for 24-hour revocation timelines for critical security incidents. In fact,
>> it allows organizations to better allocate resources and develop robust
>> processes that can be quickly mobilized in the event of a security
>> compromise.
>>
>> But whatever decision we reach as consensus is good for me--our
>> collective goal should be to find solutions that work best for the entire
>> community, and it would be great if we could come up with some solutions
>> and then recommend them to the Server Certificate Working Group of the
>> CA/Browser Forum. To facilitate this, I propose that we continue to gather
>> more input from the community, and try to understand the different
>> perspectives, which will help us refine suggestions and identify potential
>> challenges and solutions. Everyone’s continued engagement and support are
>> crucial as we work towards a consensus. I encourage everyone in the
>> community to share their thoughts and suggestions to help us develop a
>> robust and effective strategy to improve security while reducing the number
>> of CA incidents that are due to delayed revocation.
>>
>> Thank you once again for your contributions, and I look forward to our
>> continued collaboration on these important issues.
>>
>> Best regards,
>> Ben
>> On Monday, July 15, 2024 at 8:09:59 PM UTC-6 Matt Palmer wrote:
>>
>>> Hi Tim,
>>>
>>
>>> On Mon, Jul 15, 2024 at 09:22:22PM +0000, 'Tim Hollebeek' via
>>> [email protected] wrote:
>>> > If a publicly-trusted certificate is difficult to replace, for various
>>> > regulatory or technical reasons, the real reasons do not magically
>>> appear
>>> > when rotation is necessary. But a host of fake reasons are likely to
>>> arise
>>> > ("we can't rotate certificates faster because it costs money we don't
>>> want
>>> > to spend"). Furthermore, making progress on this problem would be
>>> greatly
>>> > assisted by better information about exactly which certificates can't
>>> be
>>> > replaced, the timescale on which they CAN be replaced, and why.
>>> >
>>> > The world would be better if we all knew, IN ADVANCE, which
>>> certificates are
>>> > automatically replaceable, and which aren't. This would also greatly
>>> > streamline operations when replacements are necessary, as it removes
>>> the
>>> > burden on making the determinations with a ticking clock, which is a
>>> > situation that doesn't lend itself to careful and unbiased
>>> evaluations.
>>>
>>> If I'm understanding your proposal correctly, it basically requires
>>> organisations to identify, in advance, certificates which cannot be
>>> replaced in line with the WebPKI requirements.
>>>
>>> If so, while I agree with the motivations (to have more useful
>>> information), I have... questions:
>>>
>>> 1. What is the motivation for an organisation to take the time and
>>> effort to identify all problematic certificates? These organisations
>>> apparently don't have the available resources to fix the current
>>> problems, what will their reaction be to being asked to do even more
>>> work?
>>>
>>> 2. If an organisation does not proactively declare a problematic
>>> certificate as being problematic, what are the consequences at
>>> revocation time? I can't imagine that CAs will be willing to revoke
>>> those certificates even though the organisation has not declared them as
>>> problematic, for the same reasons that those CAs are not willing to
>>> currently revoke problematic certificates.
>>>
>>> 3. If an organisation is capable of proactively identifying problematic
>>> certificates, why issue a WebPKI certificate at all? On its face, a
>>> declaration that a certificate is incapable of being rotated in line
>>> with the requirements of the WebPKI is an admission that the customer is
>>> (or at the very least expects to be) in breach of their subscriber
>>> agreement.
>>>
>>> 4. For certificates that are problematic, why add an extension to a
>>> WebPKI certificate that says "this certificate is non-compliant", rather
>>> than just moving that usage to a private PKI.
>>>
>>> 5. Do you have any reason to believe that CAs and their customers will
>>> even be *willing* to disclose this sort of information? In every
>>> previous incident that comes to mind, the prevailing attitude from CAs
>>> has been to refuse to disclose customer information in any meaningful
>>> fashion. I can understand their reticence there on one level, as a
>>> protection against "customer poaching"[1], and I'd be hesitant for
>>> Mozilla
>>> to make it a requirement for CAs to disclose this from an anti-trust
>>> action perspective.
>>>
>>> > I realize this would be a major change to how we do things, but we've
>>> been
>>> > having this exact same conversation about certificate replacement for
>>> pretty
>>> > much the entire decade I've been involved at CABForum, and I think
>>> it's time
>>> > for radical change. If this isn't the right idea, it at least gives a
>>> sense
>>> > of the kind of change that is needed to make progress here, and I
>>> would love
>>> > to hear any other potential ideas for how we finally exit the traffic
>>> circle
>>> > and start moving forward again.
>>>
>>> My proposal is that root programs require CAs to accept revocation
>>> reqests from the root programs themselves for randomly-chosen
>>> certificates. At random intervals, a root program sends a (suitably
>>> authenticated) email to the CA's problem reporting address stating "this
>>> certificate should be considered compromised as of this moment, revoke
>>> in line with the BRs". Frequency and volume could be tuned to issuance
>>> volume, with upper and lower bounds as needed to ensure universal
>>> coverage without unduly burdening any particular CA with excessive
>>> administrivia.
>>>
>>> I base this proposal on two factors:
>>>
>>> 1. Regular testing of processes is important to be confident that those
>>> processes work. When I was running the Pwnedkeys Revokinator, I found
>>> plenty of problems with revocation practices at several CAs, resulting
>>> in multiple problem reports. I'd be more than willing to resurrect the
>>> Revokinator to once again analyse revocation processing compliance if I
>>> had confidence in support for it by root programs.
>>>
>>> 2. It would put *everyone* in the ecosystem on notice that revocation is
>>> something that needs to be planned for. At the moment, organisations
>>> can deploy their infrastructure on the basis that "it'll never happen to
>>> us, we don't lose our keys / suffer from bugs / whatever", and they
>>> don't consider other causes of revocation. While the probability of any
>>> particular certificate getting chosen would be very low, that *definite*
>>> non-zero probability is likely to get more attention than any number of
>>> out-of-the-ordinary incidents that organisations can dismiss with "well,
>>> *that* would never happen to us!"
>>>
>>> - Matt
>>>
>>> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/e96e43b7-cc95-4318-9a2b-7366a4319a6cn%40mozilla.org
> <https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/e96e43b7-cc95-4318-9a2b-7366a4319a6cn%40mozilla.org?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/CACAF_WisR_1vMJOMZat5YN98JQU7gHjx%3DW68vHEea9mqnRTtSw%40mail.gmail.com.

Re: Feasibility of a binding commitment to revoke before issuance

Reply via email to