On Tue, Mar 13, 2018 at 4:02 PM, Ryan Sleevi <r...@sleevi.com> wrote:
> > > On Tue, Mar 13, 2018 at 4:13 PM, Matthew Hardeman via dev-security-policy > <dev-security-policy@lists.mozilla.org> wrote: > >> I am not at all suggesting consequences for Let's Encrypt, but rather >> raising a question as to whether that position on new inclusions / >> renewals >> is appropriate. If these things can happen in a celebrated best-practices >> environment, can they really in isolation be cause to reject a new >> application or a new root from an existing CA? >> > > While I certainly appreciate the comparison, I think it's apples and > oranges when we consider both the nature and degree, nor do I think it's > fair to suggest "in isolation" is a comparison. > I thought I recalled a recent case in which a new root/key was declined with the sole unresolved (and unresolvable, save for new key generation, etc.) matter precluding the inclusion being a prior mis-issuance of test certificates, already revoked and disclosed. Perhaps I am mistaken. > > I'm sure you can agree that incident response is defined by both the > nature and severity of the incident itself, the surrounding ecosystem > factors (i.e. was this a well-understood problem), and the detection, > response, and disclosure practices that follow. A system that does not > implement any checks whatsoever is, I hope, something we can agree is worse > than a system that relies on human checks (and virtually indistinguishable > from no checks), and that both are worse than a system with incomplete > technical checks. > > I certainly concur with all of that, which is the part of the basis for which I form my own opinion that Let's Encrypt should not suffer any consequence of significance beyond advice along the lines of "make your testing environment and procedures better". > I do agree with you that I find it challenging with how the staging > environment was tested - failure to have robust profile tests in staging, > for example, are what ultimately resulted in Turktrust's notable > misissuance of unconstrained CA certificates. Similarly, given the wide > availability of certificate linting tools - such as ZLint, x509Lint, > (AWS's) certlint, and (GlobalSign's) certlint - there's no dearth of > availability of open tools and checks. Given the industry push towards > integration of these automated tools, it's not entirely clear why LE would > invent yet another, but it's also not reasonable to require that LE use > something 'off the shelf'. > I'm very interested in how the testing occurs in terms of procedures. I would assume, for example, that no test transaction of any kind would ever be "played" against a production environment unless that same exact test transaction had already been "played" against the staging environment. With respect to this case, were these wildcard certificates requested and issued against the staging system with materially the same test transaction data, and if so was the encoding incorrect? If these were not performed against staging, what was the rational basis for executing a new and novel test transaction against the production system first? If they were performed AND if they did not encode incorrectly, then what was the disparity between the environments which led to this? (The implication being that some sort of change management process needs to be revised to keep the operating environments of staging and production better synchronized.) If they were performed and were improperly encoded on the staging environment, then one would presume that the erroneous result was missed by the various automated and manual examinations of the results of the tests. As you note, it's unreasonable to require use of any particular implementation of any particular tool but in as far as the other tools achieve certain results while clearly the LE developed tools did not catch this issue, it would appear that LE needs to better test their testing mechanisms and while it may not be necessary for them to incorporate the competing tools in the live issuance pipeline, it would seem advisable that Let's Encrypt should pass the output results (the certificates) of tests within their staging environment through these various other testing tools as part of a post-staging-deployment testing phase. It would seem logical to take the best of breed tools and stack them up whether automatically or manually and waterfall the final output results of a full suite of test scenarios against the post-deployment state of the staging environment, with a view to identifying discrepancies between the LE tool opinion and the external tool's opinion and reconciling those, rejecting invalid determinations as appropriate. > > I'm hoping that LE can provide more details about the change management > process and how, in light of this incident, it may change - both in terms > of automated testing and in certificate policy review. > > >> Another question this incident raised in my mind pertains to the parallel >> staging and production environment paradigm: If one truly has the >> 'courage >> of conviction' of the equivalence of the two environments, why would one >> not perform all tests in ONLY the staging environment, with no tests and >> nothing other than production transactions on the production environment? >> That tests continue to be executed in the production environment while >> holding to the notion that a fully parallel staging environment is the >> place for tests seems to signal that confidence in the staging environment >> is -- in some measure, however small -- limited. > > > That's ... just a bad conclusion, especially for a publicly-trusted CA :) > > I certainly agree it's possible that I've reached a bad conclusion there, but I would like to better understand how specifically? Assuming the same input data set and software manipulating said data, two systems should in general execute identically. To the extent that they do not, my initial position would be that a significant failing of change management of operating environment or data set or system level matters has occurred. I would think all of those would be issues of great concern to a CA, if for no other reason than that they should be very very rare. _______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy