Missing the point
> As a global CA we must walk a tightrope in balancing the requirements of 
the root programs and subscriber needs, especially for critical 
infrastructure.

This is a very worrying sentence. It seems that both Entrust and many of 
their subscribers (even more worryingly subscribers responsible for 
critical infrastructure) completely misunderstand what the purpose of the 
requirements of the root programs are. These rules, requirements, 
guidelines, policies, &c are here to keep us safe. And I don't mean us as 
in relying parties, I mean us as in everyone. That there is a need to 
balance these requirements against the needs of Entrust subscribers makes 
me worry about what those subscribers are doing. Why are so many 
organizations running critical infrastructure not prioritizing following 
safety regulation? 

> Many of our customers represent critical infrastructure due to their 
roles in the financial system, government, transportation, and other 
industries and there are real challenges in meeting the guidelines. We 
recognize that it is not the responsibility of our subscribers to resolve 
these conflicts. It is our responsibility as part of our commitment to 
meeting the CA/Browser Forum requirements and protecting the WebPKI. 

It's a CAs responsibility to revoke certificates when required. When this 
cannot be done without causing significant harm because of subscribers lack 
of capability to handle such a revocation event, then ensuring that a 
future revocation event can be handled without causing significant harm is 
a shared responsibility between the CA and their subscribers. In Mozilla's 
Responding to an Incident the final listed point of expectations in the 
case of delayed revocation states:

> * You will perform an analysis to determine the factors that prevented 
timely revocation of the certificates, and include a set of remediation 
actions in the final incident report that aim to prevent future revocation 
delays.

If the causes that prevents a timely revocation while avoiding significant 
harm are internal to the subscribers, then remediation actions must involve 
them those subscribers. I understand that many of Entrust customers are 
enormous corporations that can need time to implement the necessary 
changes. But once a CA becomes aware that one of their subscribers aren't 
capable handling revocation as required by the BRs, then future issuance of 
certificates to that subscribers must be predicated on that subscriber 
making commitments to be able to handle timely revocation. Obviously we 
don't want to risk harm in a future revocation event, but without requiring 
the subscriber to make these commitments you are in fact making it your 
policy to not apply the BR revocation deadlines for that subscriber.

In Comment#82 on bug 1886532 
(https://bugzilla.mozilla.org/show_bug.cgi?id=1886532#c82) Bruce Morton 
writes:

> Although it has been difficult, we now understand that the community 
places priority on strict adherence to the rules, and views revocation as a 
tool to influence subscribers into modifying how they use TLS certificates, 
and is willing to accept much more harm to subscribers and users of the 
internet than Entrust believed was acceptable.

I want to clarify that for me, this isn't golf. I want these stewards of 
critical infrastructure to adhere to the rules because if they don't I 
believe that we risk much greater harm in the future. If a subscriber 
genuinely requires weeks and months reissue their certificates without 
causing significant harm, then I agree that a delay in revocation would be 
prudent. But it is simply unacceptable that for organization controlling 
such critical infrastructure to be so extremely incapable for technological 
or organizational reasons. The statement "and views revocation as a tool to 
influence subscribers into modifying how they use TLS certificates" implies 
that Entrust did not believe that revocation should be used to influence 
how webPKI certificates should be used, but is that not what revocation is? 
If a subscriber is not using a certificate according to the BRs or TLS 
Guidelines it must be revoked, the threat of revocation is literally a tool 
to influence behavior of subscribers.

What I believe that the community is trying to communicate is that we are 
trying to avoid future harm from happening. Assurances from Entrust about 
how they **would** be able to revoke within 24h, I assume without causing 
significant harm, for a security issue ring very hollow when Entrust is 
demonstrably incapable of revocation within weeks or months when the 
problem is "only mississuance". But if we assume that we can trust that 
Entrust and their subscribers can handle a mass revocation event within 24h 
in case of a security breach that still leaves us with this:

> In our conversations with Subscribers, we transparently disclosed that 
there was no security risk to relying parties if the affected certificates 
were not revoked, and this context understandably influenced the 
prioritization.

"and this context understandably influenced the prioritization." Is there 
an any other interpretation of this sentence than: we could follow the 
rules, but we would rather spend our money elsewhere. Taken with the 
statements from the updates June 21st it is clear to me that the harm that 
Entrust is trying to avoid are the costs of following the requirements of 
the root programs.

Refusal to learn, bug 1890898

It's important to seize the opportunity to learn from your incidents. Why 
is Entrust so stubbornly clinging to their analysis in #1890898 that the 
certificates weren't mississued? I have not seen a single member of the 
webPKI community outside of Entrust share this position. Two root programs 
disagree with Entrust. The response from Entrust should not be: "We still 
think that we are right, but you're the boss", it should be: "%#?!, how 
could we come to such a different conclusion from everyone else?" The root 
cause analysis for this section is about how the certificates came to be 
mississued, it is missing completely the root cause for why Entrust ~~was~~ 
is not aligned with the rest of the webPKI community when it comes to 
interpreting how the TLS BRs and EV Guidelines interact with their CPS in 
this issue. In their June 7th report Entrust thanks industry expert Don 
Sheehy for his contributions. While it might not be polite to put him on 
the spot, I believe it would be very interesting to hear directly from him 
about if he agrees with Entrust, and if he disagrees wether Entrust knew 
but still chose to proceed with their own analysis.

It's good to see that 1890898 was included in the updated report but that 
leaves us with the fact that it's one of the issues listed by Mozilla on 
https://wiki.mozilla.org/CA/Entrust_Issues that are the subject of the 
requested report. As late as the June 5th they posted their "revised 
analysis" while it is possible that they hadn't yet written anything in 
their June 7th report about the issue it's not very likely. It would be 
interesting to see what they had in their drafts regarding 1890898, and if 
anything when it was removed. But the complete lack of actual learning for 
this issue is incredibly alarming and undermines any attempt to believe 
that Entrust can be a functioning member of the webPKI community.

Broken promises
I don't know exactly how to approach this issue but I haven't seen it 
addressed by others, and I think it needs to be confronted even if it is 
about actions (or inaction) of specific persons. I am open to the 
possibility that others when faced with the same decision came to the 
opposite conclusion, and that I am in fact wrong in taking up this issue. I 
am trying to do this as respectfully as possible.

In the June 21th Report we can read:
> Second, our organizational design and governance impeded senior 
leadership and cross-functional evaluation and awareness of CA/B 
requirements. Key decisions were in the hands of a limited number of 
employees in the digital certificates business unit who held duties across 
multiple functions, including compliance. When responding to recent 
incidents, the team did not adequately communicate applicable requirements 
to senior leadership. This led to incorrect decisions and instances where 
we did not follow delayed revocation and reporting processes as laid out by 
Mozilla, including late and incomplete incident reports. Knowledge of 2020 
commitments was similarly confined to a small number of business unit 
employees, without broader leadership team/organizational awareness.

I don't understand this explanation, are senior leadership the ones making 
the decision to delay revocation or not revoke? But those decisions are 
communicated to the community via Bugzilla, and is that not done through 
the business unit employees that have knowledge of the 2020 commitments? 
It's the same person posting: "We will not the make the decision not to 
revoke." in 1651481, that this year posted: "we decided to not revoke due 
to exceptional conditions listed in this report." in 1890898. I doubt that 
senior leadership, or their proxies weren't informed about those 
commitments, more likely is that they did not understand or care about how 
serious they were.

The organizational changes that were finally specified more clearly in the 
21th June report are obviously long overdue, but I doubt that it will have 
the impact needed for me to trust Entrust.

Conclusions

While the 21th June report is a much better attempt than the June 7th 
report I believe that it still falls short of what is expected and required.

They hear the community saying: It's your responsibility to revoke, not the 
subscribers. They think they understand, but they don't. If  they don't 
understand that the requirements of the root programs are there to keep us 
safe, how are they supposed to educate their subscribers of that fact?

Then they miss the point again, while they are justifiably getting a lot of 
criticism over their failures to revoke on time Entrust fails to understand 
that what the community want's to see are improvements so that the same 
mistakes don't happen again. I worry over what harm could happen in the 
future when there is a security issue if nothing has changed for the 
subscribers that cannot handle revocation within 5 days.

When it comes to 1890898 perhaps Entrust feels the need to stick to their 
"revised final incident report" so that they can appear consistent, but if 
that is so I think it would be a great mistake. We are here now because 
serial poor decision-making and poor incident responses. For me Entrust's 
obstinate refusal to actually change is exemplified in 1890898, and 
completely undermines any trust I have in them.

Zacharias

On Sunday 23 June 2024 at 00:39:03 UTC+2 Ryan Hurst wrote:

> Part of me wants to commend Entrust for this response. If we can believe 
> its sincerity—and this is a if given their recent history and how this has 
> played out—it took 13 compliance incidents and 107 days for their 
> leadership to recognize, at least publicly, the systemic issues that have 
> happened under their watch, and that does not even count the fact that this 
> has been a problem since at least 2020.
>
>
> The disappointing thing is that here we are, 30% into the year, and what 
> we have is a commitment to restructure a part of Entrust and fund it to do 
> better without concrete actions to address the specific issues. Meanwhile, 
> they are still trusted and exposing the internet to their continued 
> management challenges. I can’t help but think this response is too little, 
> too late. With that said, it does indicate some level of recognition of how 
> bad things have gotten, which is a step in the right direction.
>
> With all that said, it’s difficult to imagine ISRG or Sectigo, for 
> example, showing the same level of disregard for the processes at play or 
> taking this long to get to this point. While this organizational change 
> might help address that, at the same time as of three days ago, it appears 
> that Entrust was still suggesting that EV certs issued in violation of 
> their CPS weren’t actually misissued. This raises questions about whether 
> they have truly internalized the gravity of the situation or if this public 
> gesture is just that—a gesture.
>
> Beyond that, the thing that I can’t help but ask myself is how long is too 
> long. I’ve not yet gone back and looked at the average response time for 
> other incidents, but just from looking at this thread and the associated 
> bugs since March 6th, there are still missing responses/updates that were 
> promised, and those that were provided were shallow. In my experience as an 
> engineering leader, my first priority in a situation like this would be to 
> ensure that we never missed a promised or obligated response, the second 
> would be to make sure we had done everything possible to address the 
> identified issues immediately. It’s unfortunate that at this point, we are 
> not even there yet. Entrust has had every opportunity to do the right 
> thing, but even with the world watching, they didn’t seem to prioritize it. 
> As a result, today’s response might be better categorized as performative.
>
> Ryan Hurst
>
>
> On Friday, June 21, 2024 at 2:17:30 PM UTC-7 Wayne wrote:
>
>> This has been written without checking prior replies - there may be 
>> overlap.
>>
>> First off, good work on the new report addressing more matters however 
>> this should have been your original report at a minimum. Before I even 
>> start I will outright state that I hope that Entrust actually improves 
>> throughout this and while this comment will be cleaned up it reflects an 
>> ongoing opinion as the report is read.
>>
>> First looking at the letter I will only note this paragraph:
>> "We are disappointed as this does not represent Entrust values and falls 
>> short of the standards we set for ourselves. We also want to make sure it 
>> is understood that none of these lapses have been malicious or done with 
>> ill-intent to make the internet less secure. As a global CA we must walk a 
>> tightrope in balancing the requirements of the root programs and subscriber 
>> needs, especially for critical infrastructure. In some cases, we did not 
>> strike the right balance."
>>
>> It does trouble me that compliance is seen as a balancing point against 
>> issuance for critical infrastructure. There has been a common talking point 
>> of Entrust's delayed revocation incidents of the concept of irresponsible 
>> revocation. I point it to Entrust that such a scenario only presents itself 
>> when a CA is culpable of irresponsible issuance.
>>
>> As I read through this I keep seeing a repeating pattern of changing the 
>> organizational structure and creating committees and board of 
>> cross-discipline personnel. While this is all good in theory, I am 
>> concerned that this is not addressing the actual root causes of internal 
>> decision making and that the outputs will be just the same with a different 
>> label on the team providing it.
>>
>> Before I delve into any minutiae of the report itself I do find it 
>> noteworthy that in incident #1890898 (Entrust: Failure to revoke OV TLS 
>> - CPS typographical (text placement) error)) 
>> <https://bugzilla.mozilla.org/show_bug.cgi?id=1890898> we have a 
>> functional example of the new cross-functional team evaluating compliance 
>> and coming to a decision. Now this could be a third unspecified team but 
>> given the report I presume this is the template going forward, for brevity 
>> I'll keep this to the broad strokes:
>>
>> 2024-04-11: Issue opens, mis-issuance confirmed but no intent to revoke. 
>> A long conversation ensues, nothing changes until a day before the June 7th 
>> report appears.
>>
>> 2024-06-06 <https://bugzilla.mozilla.org/show_bug.cgi?id=1890898#c28>: 
>> "We reviewed and consulted with independent external experts on this 
>> revised analysis, and based on this broader consultation, we now believe 
>> there was no mis-issuance and thus no need to revoke the affected 
>> certificates. A detailed analysis is below."
>>
>> Following that analysis Mozilla and Chrome's Root Programs give a 
>> different opinion.
>>
>> 2024-06-18 <https://bugzilla.mozilla.org/show_bug.cgi?id=1890898#c42>: 
>> "On this basis, we will treat this as a mis-issuance, and intend to 
>> complete revocation by end of day Saturday, June 22."
>>
>> 2024-06-19 <https://bugzilla.mozilla.org/show_bug.cgi?id=1890898#c49>: 
>> "On the last question, our position is that there was no mis-issuance—not 
>> that there was a mis-issuance and we decided not to revoke which is the 
>> situation that recommends discussion with affected root stores."
>>
>> Now, using the 06-06 opinion as a basis we have an example of this new 
>> cross-functional team. They reviewed the original incident and came to a 
>> conclusion that a) was not the same as Entrust in April, but crucially b) 
>> was not compatible with the viewpoints of the Root Programs who spoke up. I 
>> am of the strong belief of evaluating institutional changes not on their 
>> stated internal changes, but on their outputs. The decisions are all we 
>> will see, they are all that will matter in practice.
>>
>> I will not detail line by line but I do notice that some factual 
>> discrepancies in the original report have been addressed. It would be good 
>> to find out how those came to be in the first case. There are still 
>> outstanding ones that I already stated previously.
>>
>> >>Note: During our investigation of this issue, we noted that a subset of 
>> 1,975 EV certificates were also issued without the Entrust EV policy 
>> identifier (OID), based on our interpretation of the ballot update.
>> >This is also a miscount, presumably due to the original figure being 
>> 1963 + 6 certs on a test site that are being double-counted.
>>
>> On reading further in 2.1.1 Entrust have outright stated they still stand 
>> by their incorrect analysis as previously noted in this reply. This speaks 
>> volumes as to the decisions that will occur going forward. Within 2.1.3 
>> there is a mention of Entrust continuing to issue certificates and advocate 
>> their position, but I am seeing no reflection as to the root cause of what 
>> causes them to advocate for their incorrect positions to this day. Not a 
>> single line of 2.1.4 addresses this either.
>>
>> Oddly 2.2.3 does not mention that on April 3rd "The issue was escalated 
>> to our verification team for further investigation.". Instead it purports a 
>> subtly different timeline where nothing happened until the 15th. The April 
>> 4th issue as stated in the bugzilla timeline is also absent.
>>
>> It is at this point in the report that my original reply must have gotten 
>> lost as I still have outstanding issues. I am quoting my original reply 
>> below:
>>
>> >>2.3.4 Improvement Plan
>> >>...
>> >>Automate CPR form to collect all required information at the outset 
>> from the reporter rather than relying solely on email
>> >This goes back to policy issues discussed for years now, see:
>> >https://github.com/mozilla/pkipolicy/issues/98
>> >https://github.com/cabforum/servercert/issues/201
>> >https://bugzilla.mozilla.org/show_bug.cgi?id=1650234
>>
>> Now, moving on. In 2.4.1 I am again mis-identified as a reporter of the 
>> EV cert issue. This does not factually matter but is amusing as the initial 
>> factual corrections show that part of my response was read and applied.
>>
>> The only significant change I can see in 2.5.1 is the insistence that the 
>> analysis Entrust performed on the mis-issuance not existing on the OV TLS 
>> Typo issue must still be correct. As previously stated, I do not see how 
>> this is compatible with multiple Root Programs stating otherwise.
>>
>> I am confused about 2.5.3 though, it is about delayed revocation but the 
>> RCA is focused on the technical issue in the original incidents. 2.5.4 
>> contradicts itself from paragraph to paragraph. A commitment to revocation 
>> and replacement, and then statements that delays will be managed on a 
>> case-by-case basis. 
>>
>> It is a bit troubling to see the conclusion state the following:
>> "The mis-issuances we experienced were technical non-conformities and, 
>> had any one of them happened in isolation, they would not have resulted in 
>> us taking such a hard look at our program and finding the opportunities 
>> that we did."
>>
>> Regarding ACME, I previously stated this question and will repeat it now: 
>> Can you make any guarantees that ACME will be a requirement for subscribers 
>> going forward, and that they will not be charged extra for using these 
>> systems?
>>
>> Looking into 4.3 Appendix 3: Success Measures I won't address each 
>> individually. I am curious how you intend to get the WebTrust annual audit 
>> results to result in 0 qualifications in the space of a year. I would 
>> suggest an element for Communication is added to address how often a 
>> question has to be restated or followed up on due to a lack of clarity and 
>> transparency. Otherwise the list presents a minimal standard for any 
>> complying CA, if this is not kept by any CA it would be further cause for 
>> concern.
>>
>> Once again in evaluating against what was requested I am struck at how 
>> the systemic failures are not being addressed. We have commitments to 
>> committees and boards, but the decisions are what truly matter. There is no 
>> mention of what policies caused these initial issues and how they were not 
>> adhered to. The 2020 commitments are only highlighted due to every comment 
>> noting it specifically, no attempt seems to exist to evaluate against 
>> historical issues.
>>
>> On the 2020 commitments I am deeply troubled about this statement in 
>> particular:
>> "Knowledge of 2020 commitments was similarly confined to a small number 
>> of business unit employees, without broader leadership team/organizational 
>> awareness."
>> This should have came up in audits which cover incidents on bugzilla. 
>> What happened? Did the auditor only address this with the same small number 
>> of business unit employees and somehow no note of these commitments made it 
>> into any report that went further up the chain? What confidence can we have 
>> in any bugzilla-specific commitments outside of this report going forward?
>>
>> As a final note I will highlight this section:
>> "As part of our response process to the Mozilla community, Entrust 
>> assigned a group of three senior leaders, as well as an external 
>> consultant, to review each incident to validate and expand root cause 
>> analysis."
>>
>> Can we please have a breakdown on Entrust's end of what their original 
>> opinion was at the start of each incident, and how these personnel would 
>> evaluate the situation if it were to happen today? I sincerely hope that 
>> #1890898 is not an example going forward.
>>
>> The point of incident reports and action items is to ensure things do not 
>> repeat, knowing that the decision-making process is repaired would be one 
>> small step.
>>
>> - Wayne
>>
>> On Tuesday, June 18, 2024 at 6:35:48 PM UTC+1 Amir Omidi (aaomidi) wrote:
>>
>>> I am not going to say with certainty that Entrust is definitely putting 
>>> Chrome over Mozilla. However, I hope they know that most Linux systems out 
>>> there use the Mozilla root store directly.
>>> On Tuesday, June 18, 2024 at 1:12:19 PM UTC-4 Mike Shaver wrote:
>>>
>>>> On Tue, Jun 18, 2024 at 12:49 PM Walt <[email protected]> wrote:
>>>>
>>>>> I'd just like to point out that we now have a situation where Entrust 
>>>>> is in the position of seemingly valuing the opinion of other Root 
>>>>> Programs 
>>>>> over Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1890898#c42
>>>>>
>>>>> In Comment #37, it was hinted at (and made slightly more explicit in 
>>>>> #39) that the opinion of the Mozilla RP is that the attempt to 
>>>>> re-characterize these certs was not going to be looked kindly upon, and 
>>>>> only once a Google RP member explicitly said that it was the Google RP 
>>>>> opinion that the certs remained mis-issued was any movement made on 
>>>>> re-confirming the mis-issuance and taking action to revoke them.
>>>>>
>>>>> Also, if we're in a position where Entrust is finally able to commit 
>>>>> to revoking certs within a 5 day period (setting aside that these certs 
>>>>> technically need a delayed revocation bug as the mis-issuance was known 
>>>>> as 
>>>>> far back as 2024-04-10), why are other incidents not able to be resolved 
>>>>> in 
>>>>> this amount of time? Is it because Google showed up? 
>>>>>
>>>>
>>>> We’ve seen this behaviour in other incidents as well, I believe 
>>>> including the cpsURI one that has turned into a magnet for evidence of 
>>>> poor 
>>>> operation and lack of transparency and responsiveness. I remarked on it in 
>>>> my initial snarky reply to the Entrust Report, in fact.
>>>>
>>>> From a realpolitik perspective their behaviour could indeed be 
>>>> rational, especially when the only tool root programs have is distrust. 
>>>> Firefox would suffer substantial market disadvantage if it stopped 
>>>> trusting 
>>>> Entrust certificates when other browsers didn’t. I think people generally 
>>>> underestimate how much Mozilla would be willing to take near-term pain to 
>>>> protect users, but it’s also possible that I am overestimating it.
>>>>
>>>> Related to that, I think Chrome’s root program representatives have 
>>>> generally been more willing to take a concrete position quickly, so 
>>>> Mozilla 
>>>> might be waiting for more explanation when Chrome decides that there’s no 
>>>> explanation that could suffice, or similar. The root programs tend to be 
>>>> in 
>>>> agreement more often than not (virtually always with Chrome and Mozilla, I 
>>>> would say, excepting some slightly different root store populations), so 
>>>> it 
>>>> may be somewhat irrelevant whose opinion spurs motion.
>>>>
>>>> Realpolitik analysis aside, I do agree that Entrust has created the 
>>>> impression that they care much more about Chrome’s opinion than Mozilla’s, 
>>>> which IMO might not be the best posture to take given that Mozilla and its 
>>>> community are the locus for the processing and evaluation of the incidents 
>>>> in question.
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/76e36c14-3152-4718-8a1f-6c49a83095ben%40mozilla.org.

Reply via email to