FW: StartCom inclusion request: next steps

Inigo Barreira via dev-security-policy Thu, 14 Sep 2017 08:00:41 -0700

All,

Obviously this is not the message we would like to read and will try to explain 
and rebate as much as possible some of the comments posted here.

> 
> The Mozilla CA Certificates team has been considering what the appropriate
> next steps are for the inclusion request from the CA "StartCom".[0] As readers
> will know, this CA has previously been removed from trust[1], and so a re-
> application obviously involves particular scrutiny. In addition, several
> questions have been raised about the way in which the new StartCom PKI has
> been operated technically[2]. This is a proposal for the way forward, on which
> comments are invited.
> 
> Mozilla's considered view is as follows:
> 
> * It should have been obvious to StartCom that testing of their new systems
> needed to be done using a parallel testing hierarchy.

Those tests were done to check the CT behaviour, there was any other testing of 
the new systems, just for the CT. Those certs were under control all the time 
and were lived for some minutes because were revoked inmediately after checking 
the certs were logged correctly in the CTs. It´s not a mis-issuance by means of 
we didn´t know what happened, we had to investigate, etc. It was not a good 
practice and I can´t excuse for that, but it was not related to the regular 
issuance procedure as someone suggested. We provided a report in which 
indicated all that happened and what we did to not happen this again, updating 
the EJBCA roles permissions.

> That it was not obvious, is deeply concerning. It is also concerning that 
> someone can sit at a terminal
> and issue random certificates with variable values in lots of fields, in what 
> is
> to become a publicly-trusted hierarchy.

Well, it was possible at that time, but only the CA administrator could do it 
and under many requirements. It´s not like sitting at a terminal and start 
issuing certificates, there were and are security mechanisms to avoid "someone" 
could do that and I can list many. Probably most of the CA administrators of 
the rest of CAs  had this capacity (maybe not now) because the majority of the 
PKI software allows it and it´s needed when building a hierarchy. 

> It's not about numbers (e.g. "40 out of 50000"), it's about the process.
> 

This number of 40 is about the total of "mis-issuances" discovered, not only 
related to these ones for the CT testing. And some other times, discussed in 
this list, the number matters. Even more, for those 40, most of them were 
"discovered" by us and acted accordingly as per the BRs. We revoked the 
majority of them within the 24 hours of being notified internally. When those 
were posted in the bugzilla, as said, most were revoked and started the 
investigation on what happened and what actions needed to be done. Some of 
these "mis-issuances" were due to some incongruencies between the BRs and the 
Mozilla policy, such as the use of different curves (allowed by the BRs but not 
for some browsers), or about pre-certificates in which is not clear if they 
fall under what requirement as a discussion started by Jeremy on the list. For 
example, is it necessary to revoke also the pre-certificates when a certificate 
is revoked? Are they need to be considered certificates and meet the BRs and/or 
Mozilla policy?
Or about the use of Unicode vs punnycode which is still under discussions, even 
a ballot failed in the CABF. So, those errors we made were also made by some 
others, and not being as an excuse, but it seems that it was not clear for the 
CAs.

We updated our procedure issuance to avoid these issues happening in mid July. 
What did we do?
- Restrict the use of eliptic curves only to those admitted by Mozilla
- Change the certificate profile for not having differences with the key 
encipherment and key agreement
- update the internal db for country codes
- update the sytems for changing all domain names to punnycode
- and recently develop a csr checking tool to avoid the issue with RSA 
parameters because EJBCA didn´t have it at that time (it comes now with the new 
release 6.9.0)

> As JC Jones wrote:
> 
> "This is a professional PKI operation, being overseen by industry veterans. If
> something as concrete as the issuance process had such a glaring quality
> assurance methodology failure, why should anyone believe that something
> much harder -- subscriber validation -- is going to be done correctly?"

Well, this is an opinion. And I fully respect but none is free of failures and 
let´s encrypt (and many others) is also having them as we´re seeing recently 
with weak keys, etc. and I´m none to say they are not professional, or not 
having a quality assurance methodology, ... or for not believing they are not 
acting correctly. For sure they are, but same as us. I can´t critize anyone and 
not because we´re in a weak position at Startcom in which everyone is looking 
deeply what we do, scrutinizing deeply.
For all these failures we acted quickly because our internal procedures worked 
well and as said, at the time of discovering most of them were revoked and 
solutions were ongoing. It´s not like "thanks for letting us know, we didn´t 
know, we are going to investigate, etc."

I can put other opinions here. For example. Matthew Hardeman wrote:
"If Inigo has prior CA management experience and is running the technical 
picture at Startcom now, why not allow them to proceed under this new PKI 
infrastructure with past issues set aside and take a serious stance to any 
issues going forward.

As far as I know, the current manager of Startcom has not been previously 
accused of deception or bad action.  Far more than has been problematic in this 
early testing phase of their new PKI has been forgiven by the root programs 
before.

Nothing disastrous or intentionally dishonest has been done in their new PKI.  
Why not grant them a gentleman's chance to proceed and address any further 
issues with great scrutiny?"

> 
> * The key for their new root certificate was also used in a couple of
> intermediates (one revoked as it was done incorrectly - again, lack of
> testing!). While this is probably not a policy violation, it's not good 
> practice.
> 

Yes, it´s not a policy violation. As explained, this was a problem in the EJBCA 
with the UTF8 encoding. It´s not related to a lack of testing, we generated 
intermediates in our development and QA system, it´s the same procedure and we 
followed it, nothing happened in the others but this one had this issue, so we 
had to revoke and create a new one. This happened in April.

> * StartCom's infrastructure audit, performed by Cure53, was frankly a security
> disaster. (They are using EJBCA for CA operations; this was an audit of their
> front end and customer management systems, which were rewritten by a
> team from their new owner, Qihoo 360.) The (PHP) codebase was full of
> holes, poorly commented, had few or no tests, and showed every evidence of
> being hacked together in an enormous rush. This does not inspire confidence.
> Cure53 say they retested a couple of months later and most of the holes they
> found were fixed - although they found quite a few more. All this does not
> bode well - Cure53 are not infallible, an audit is not a substitute for secure
> coding practices, and the initial results show that the software was clearly 
> not
> built by people who understand software security. The summary of their
> results is attached to their Action Items bug[3], but it does leave out some 
> of
> the more critical passages of commentary from the original, and of course
> does not show the particular holes found and their scope and severity.

Yes, it´s true. The first security audit didn´t go very well. That was 
mentioned also during the CABF F2F meeting at Cisco. In our remediation plan we 
imposed ourselves a very tight timing for this task and we failed. It was a 
very hard task in very few time but the people at 360 tried everything to get 
it done by that date, end of december 2016, and yes, we reached the date but 
with many failures. I may think that everyone has suffered this type of 
situations and none can write code at the first time without errors. So, of 
course, we went for "another round" because that was not aceptable. The second 
audit as you mention reflects that the issues found in the first one were fixed 
and some new ones cameo ut, but which were also fixed later on. Since then, the 
RD team and Security team have evolved the system and right now is robust.
In any case, until we had the OK from Cure 53, we didn´t go further and didn´t 
go live. Later on we generated the subCAs in production and then started to 
issue certificates.
And for sure Cure 53 is not infalible, but it´s true that those security audits 
gone very deep, and that the security team at 360 has continued improving the 
system.

> 
> * The WT/BR/EV audits on StartCom's website are significantly qualified, and
> they include lack of controls on issuance. They should have clean ones done
> before we permit any inclusion request to proceed. The qualifications include:
> 
> - Risk analysis process defined but not implemented
> - Business continuity plan defined but not implemented
> - Audit logs not guaranteed to have integrity
> - Monitoring system cannot detect security-related changes to
>   Certificate Systems
> 

Yes, this is also true. Our webtrust audits have findings but those are not so 
significant according to the auditors who signed the reports, so I assume the 
auditors thought that the system is good enough to have the audit report in 
place. Of course, I had wanted to have a clean one but it´s also true that the 
reports indicate that most of them are fixed, and I think it´s a matter of 
transparency.
We prepared a Corrective Action Plan providing solutions for all the findings 
and indicating time of application. We sent to the auditors with all the 
evidences showing that most were fixed rightly. 
I´d like to explain those you mention.

Risk analysis: the risk analysis was defined and was implemented. We had the 
2016 risk analysis done and the new 2017 risk analysis was scheduled in october 
this year, following the agenda set. The auditors requested the 2017 one and 
hence we did it just after the audit. But in any case, the 2017 risk analysis 
is done and sent to the auditors.

Business Continuity Plan: the BCP was defined and implemented, but partially. 
We were very optimistic to meet everything we wrote and at the time of the 
audit some things that were in the document were not finished in time, so, 
that´s the finding. Of course, I could have written a BCP less onerous and then 
have met what we had in place and not having the finding, but I wanted a very 
good one, and thus, didn´t mind that finding. BTW, we´ve finished our BCP 
according to the document.

Audit logs integrity: all logs were and are signed internally in the PKI system 
(it´s part of the configuration of the EJBCA) and provided all the evidences 
but they also requested to do the same with some other external components and 
not all products come with that feature, so had to develop it. It was applied  
at the beginning of june and sent the evidences to the auditors. 

Monitoring: Startcom´s monitoring system was focused mainly on the monitoring 
of the server status (CPU, load, memory, etc.) of all servers under the PKI 
infrastructure. Furthermore, all services provided were also monitored, to 
check that even the server hosting the service was ok, the service had to be 
also live and running. So checking the service availability.

This finding is about updating the monitoring configuration alerting StartCom 
specific teams/people to check if there are some sensitive configuration files 
of the infrastructure that are being changed. Internally, StartCom has a manual 
procedure when some changes/updates to these files are requested in which need 
the approval by the service manager, then modify/update the system, and the 
auditors wanted to have this done also automatically, so updated the monitoring 
to alert inmediately specific teams in case of sensitive configuration files 
are changed. We extended the scope to cover all systems, DB, webservers, nginx, 
... and are using a tool called inotify, which monitor and check all the 
writing operations. This was done at the beginning of June.

> * Certnomis chose to cross-sign StartCom while StartCom had audits with
> significant qualifications,

Well, the auditors explained in the reports that most of them were fixed, so 
even they are in the report because it´s what they found and despite some 
discussions about all them (I wasn´t agree with some), you can´t consider them 
as significant (I know it´s an opinion), or at least it´s contrary to auditor´s 
opinion. You can contact the auditors if you wish to request for their opinion 
about the audit report, what they found, what we provided, the CAP, etc.

> and allowed them to recommence publicly-trusted issuance before they had 
> demonstrated to Mozilla that they had met the
> remediation conditions required. While this may not have been against the
> letter of our requirements for StartCom to restart trusted operations, we feel
> it was not in the spirit of them.
> 

Not sure how to interpretate this. We followed our remediation plan and the 
Mozilla requirements, once we met all of them, we reapplied to be included in 
Mozilla, which it was in july (time after the mis-issuances, security reports 
and audit findings) following the steps as any other CA. We´ve seen in the past 
some others doing the same. Certinomis put us some requirements, like having 
the WT audit certificates, but maybe some others don´t put any requirement, 
just cross-sign the new CA and this one, in the meantime, gets its audit for 
example. 

> All in all, this attempt to start a new CA compares poorly with other recent
> executions of this process, such as those by Google and Amazon.
> While those companies do have significantly more resources than StartCom,
> many of the issues raised are questions of good practice, not of money.
> 

I don´t know how these you mention have applied, but I remember lots of issues 
regarding Google and the acquisition of the Globalsign roots and how they 
proceeded. 
Again I don´t know these examples so can´t speak for them and don´t like to 
talk about others but  it´s different to start when you have customers 
requesting certificates, asking for all that happened, etc. rather than 
starting from scratch if you don´t have customers yet or very few, or when you 
have other roots accepted in the root programs from which you could provide 
your services normally. We were in a very difficult situation, with lots of 
pressure even from the mozilla community, because even though being distrusted 
(we didn´t exist) and not applied yet for re-inclussion we were all the time 
asked/suggested/questioned many things that maybe are not done with other CAs.
And yes, at the end of the day, it´s a matter of money. You can´t do many 
things, or have to be put on hold new projects if you have no resources or 
these are limited. I´d like to do many things at startcom at the same time, I 
have lots of projects in my mind to develop but can´t be done because of it 
(lack of money). I don´t think this is only a matter of good practice because 
all is related.

> Conclusion: StartCom's attempt to restart the CA was rushed.

Yes, I admit it. 

> One could speculate why that was; perhaps due to a requirement to start 
> generating
> income again.

Well, finally this is a business and I don´t think none on this list is working 
for free. At the end everyone has his/her salary, etc. But that was not the 
main reason because getting included in the root programs takes time but wanted 
to provide our customers which gave us support for what happened with the 
distrust (which IMHO in the case of Startcom was very aggressive) a solution 
generating a new fresh and clean system.

> But a process of building a production PKI by trial and error, revoking your 
> mistakes, is entirely inappropriate.

I don´t think we´ve built a trial and error PKI system. As you´ve said, 40 of 
50000 in this question matters. If everything was a trial and error system, of 
course, it´s inaceptable, but I think this is not the case. As said, some of 
those 40 are due to some different "interpretations" which are still being 
discussed in the mozilla list. Revoking is the first thing to do for a CA, and 
start investigating the issue and propose countermeasures to avoid that 
happening again. And this is what we have done. 
And, using the same argument, and seing the recently issues that have been 
described in the mozilla list during this summer, I don´t think that Startcom 
is doing that bad not based in numbers not in the errors itself. As said, we 
have implemented and improved many things, integrating cablint/X509lint in our 
processes, also the crt.sh in our CMS system, key validations, have all new 
EJBCA releases up to date, etc. 

> The qualified audits include missing or unimplemented processes, and 
> audit/monitoring failures which
> lead to uncertainty as to how well the new roots were protected.

This has been explained and don´t think is accurate. Regarding the roots, they 
are very well protected, we had a root key ceremony which an auditor witnessing 
it and with a final report in which everything was ok. 

> This all shows that StartCom were not ready to start up the production PKI 
> when they
> did. And yet Firefox today trusts tens of thousands of certificates issued by
> this PKI.

Not sure about this. We were distrusted in october 2016, the new system started 
to operate in april 2017, which is not related to the old one which has been 
switched off. None of the new certs are trusted in Mozilla Firefox and we 
notify our users so by messages in the web and applications.
> 
> Considering all this, our proposal is to require that StartCom begin again 
> with
> new-new roots. These roots should be generated inside an already-security-
> validated infrastructure, as part of a new WT/BR audit process, the end
> results of which are clean because they already have all the policies and
> processes in place before the roots are generated.
> They should also build and use a parallel testing hierarchy, so that major
> operations done on the production PKI are done right, first time.
> Once they have generated new-new roots and intermediates, and got clean
> audits, they can re-re-apply for inclusion.

I don´t know how to understand this requirement. We´re required to generate new 
roots and intermediates, get a clean audit and then re-apply. So, the only 
difference of what we have done, it´s just the clean audit, which I´ve already 
explained. Is this interpretation ok?

> 
> No-one should be allowed to cross-sign this new hierarchy until, at minimum,
> Mozilla has pronounced itself satisfied that the 5 (or 6) remediation
> conditions which were imposed have been met. To permit otherwise is to
> allow the bypassing of Mozilla's requirements.

Ok, I see this is a new requirement that was not imposed last time in which you 
recommended and allowed us to be cross-signed as many other CAs have done in 
the past to be in the business.

We´ve met all the conditions, new system, new management, security audit and 
webtrust audit and CT logging. In those conditions, it was not mentioned that 
the webtrust audit should be clean but as indicated time ago, we wanted to have 
a clean one and hence perform a new one (we told so the auditors), but asked us 
to wait until we had everything in place (also said recently that only the TSA 
and the BCP issues were pending) , and then wait for another 2 months as the WT 
requirements indicate. 

> 
> We should add the existing Certnomis cross-signs to OneCRL to revoke all the
> existing certificates. As of 10th August (now a month ago) StartCom said they
> have 50000 outstanding SSL certs which are valid due to the Certnomis cross-
> sign.

I´ve never said this. In fact, despite having that cross-signed which were 
provided to us in july we have never used and provided to any of our customers 
to build a trusted path. So none of those 50000, or the new ones, go with the 
Certinomis path because none have it. But all those 50000 certs are untrusted 
because we´re not in the Mozilla root, not the new one, and the old one was 
distrusted.
In fact, recently, I asked for permission to use the Certinomis cross-signed 
certificates and have no response. I don´t know if this is an administrative 
silence which may allow me to use it but until having a clear direction we 
haven´t used it. 

> Revoking them all by adding intermediates to OneCRL would therefore lead to 
> non-negligible disruption. But these were issued by an org whose
> most recent audits are qualified, which is under sanction, and about whose
> issuance practices and process safety there is a reasonable amount of doubt.

Again, I don´t understand why you say this. We haven´t used the Certinomis path 
so no need to revoke anything. Regarding the audits, which are qualified, does 
this mean that only clean audits are valid? This is not a requirement in the 
Mozilla policy afaik.

> We may allow a grace period for customers to replace them with certs from a
> trusted provider.
> 
> We are not sure what to do about StartCom's poor quality PHP code. While
> continued use of it would cause us concern, we are not really in a position to
> request particular changes to it, or a complete rewrite, in a verifiable way. 
> On
> the other hand, a security audit is a remediation condition, and the current
> codebase can hardly be said to have passed with flying colours.

I think this has been explained. I don´t understand why you say it´s a "poor 
quality PHP code". As said, I admit that the first time was not good, but 
that´s not the reality now and it´s not when we applied for re-inclusion. Are 
you going to check all the code of all CAs? I don´t think so.
> 
> We feel some sympathy for StartCom CEO Inigo Barreira, who has been
> placed in a difficult position since he took on the role,

Thanks, I really appreciate it.

> but we need to treat all CAs equally and fairly, according to our 
> professional judgement. We would
> not accept this set of circumstances from other CAs, and so we feel we cannot
> accept it here.

Well, frankly, I think the requirements impossed to StartCom are not the same 
you require for other CAs. We´ve had additional requirements and hence you´re 
not treating all the CAs equally. We accepted those requirements and I think 
we´ve gone further than that (i.e. adquiring a new well-know PKI solution like 
EJBCA) but it´s true that the reasons for distrust Startcom were not for 
technically reasons mainly and we see (technical) issues in other CAs which are 
not treated so deeply. So, I think that StartCom is treated differently.

We applied in july, following all the steps required and answering to the 
conditions impossed, some of the issues happened before the application, most 
of the audit findings were fixed before the application, we applied some of the 
countermeasures explained in mid-july, just after the application, since the 
application we´ve made lots of improvements as explained and I think Mozilla 
didn´t started yet handling our our application because we were waiting in the 
queue due to the number of applications.
So, when does this process start? I mean, are you dealing with historic or past 
issues even before applying? I think these questions have been posted in the 
mozilla list and this is the case which I think we´re not treated equally.

It´s been some discussion about StartCom during this time, digging and posting 
things not related but to hurt us, even when were distrusted and none of our 
certificates were valid somehow. The number again matters, 50000, for being 
distrusted is not a usual number, and these have been issued with the new one, 
just since april, and only 40 had some issues, which as mentioned, maybe not so 
clear and still under discussion. But instead, the charges against us seem that 
we are a total disaster and not deserve any trust nor chance. If the community 
feels so, then would like to see how the others are treated when make some 
mistake. And will think again we´re not treated equally.

> 
> Comments on this proposal are welcomed.

> 
> Gerv
> 
> 
> [0] https://bugzilla.mozilla.org/show_bug.cgi?id=1381406
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1311832
> [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1369359
>     https://bugzilla.mozilla.org/show_bug.cgi?id=1386894
>     https://bugzilla.mozilla.org/show_bug.cgi?id=1386891
>     https://bugzilla.mozilla.org/show_bug.cgi?id=1381406#c5
>     and other comments in m.d.s.p.
> [3] https://bugzilla.mozilla.org/attachment.cgi?id=8886970

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

FW: StartCom inclusion request: next steps

Reply via email to