On Tue, Mar 13, 2018 at 4:02 PM, Ryan Sleevi <r...@sleevi.com> wrote:
> On Tue, Mar 13, 2018 at 4:13 PM, Matthew Hardeman via dev-security-policy
> <email@example.com> wrote:
>> I am not at all suggesting consequences for Let's Encrypt, but rather
>> raising a question as to whether that position on new inclusions /
>> is appropriate. If these things can happen in a celebrated best-practices
>> environment, can they really in isolation be cause to reject a new
>> application or a new root from an existing CA?
> While I certainly appreciate the comparison, I think it's apples and
> oranges when we consider both the nature and degree, nor do I think it's
> fair to suggest "in isolation" is a comparison.
I thought I recalled a recent case in which a new root/key was declined
with the sole unresolved (and unresolvable, save for new key generation,
etc.) matter precluding the inclusion being a prior mis-issuance of test
certificates, already revoked and disclosed. Perhaps I am mistaken.
> I'm sure you can agree that incident response is defined by both the
> nature and severity of the incident itself, the surrounding ecosystem
> factors (i.e. was this a well-understood problem), and the detection,
> response, and disclosure practices that follow. A system that does not
> implement any checks whatsoever is, I hope, something we can agree is worse
> than a system that relies on human checks (and virtually indistinguishable
> from no checks), and that both are worse than a system with incomplete
> technical checks.
I certainly concur with all of that, which is the part of the basis for
which I form my own opinion that Let's Encrypt should not suffer any
consequence of significance beyond advice along the lines of "make your
testing environment and procedures better".
> I do agree with you that I find it challenging with how the staging
> environment was tested - failure to have robust profile tests in staging,
> for example, are what ultimately resulted in Turktrust's notable
> misissuance of unconstrained CA certificates. Similarly, given the wide
> availability of certificate linting tools - such as ZLint, x509Lint,
> (AWS's) certlint, and (GlobalSign's) certlint - there's no dearth of
> availability of open tools and checks. Given the industry push towards
> integration of these automated tools, it's not entirely clear why LE would
> invent yet another, but it's also not reasonable to require that LE use
> something 'off the shelf'.
I'm very interested in how the testing occurs in terms of procedures. I
would assume, for example, that no test transaction of any kind would ever
be "played" against a production environment unless that same exact test
transaction had already been "played" against the staging environment.
With respect to this case, were these wildcard certificates requested and
issued against the staging system with materially the same test transaction
data, and if so was the encoding incorrect? If these were not performed
against staging, what was the rational basis for executing a new and novel
test transaction against the production system first? If they were
performed AND if they did not encode incorrectly, then what was the
disparity between the environments which led to this? (The implication
being that some sort of change management process needs to be revised to
keep the operating environments of staging and production better
synchronized.) If they were performed and were improperly encoded on the
staging environment, then one would presume that the erroneous result was
missed by the various automated and manual examinations of the results of
As you note, it's unreasonable to require use of any particular
implementation of any particular tool but in as far as the other tools
achieve certain results while clearly the LE developed tools did not catch
this issue, it would appear that LE needs to better test their testing
mechanisms and while it may not be necessary for them to incorporate the
competing tools in the live issuance pipeline, it would seem advisable that
Let's Encrypt should pass the output results (the certificates) of tests
within their staging environment through these various other testing tools
as part of a post-staging-deployment testing phase. It would seem logical
to take the best of breed tools and stack them up whether automatically or
manually and waterfall the final output results of a full suite of test
scenarios against the post-deployment state of the staging environment,
with a view to identifying discrepancies between the LE tool opinion and
the external tool's opinion and reconciling those, rejecting invalid
determinations as appropriate.
> I'm hoping that LE can provide more details about the change management
> process and how, in light of this incident, it may change - both in terms
> of automated testing and in certificate policy review.
>> Another question this incident raised in my mind pertains to the parallel
>> staging and production environment paradigm: If one truly has the
>> of conviction' of the equivalence of the two environments, why would one
>> not perform all tests in ONLY the staging environment, with no tests and
>> nothing other than production transactions on the production environment?
>> That tests continue to be executed in the production environment while
>> holding to the notion that a fully parallel staging environment is the
>> place for tests seems to signal that confidence in the staging environment
>> is -- in some measure, however small -- limited.
> That's ... just a bad conclusion, especially for a publicly-trusted CA :)
I certainly agree it's possible that I've reached a bad conclusion there,
but I would like to better understand how specifically? Assuming the same
input data set and software manipulating said data, two systems should in
general execute identically. To the extent that they do not, my initial
position would be that a significant failing of change management of
operating environment or data set or system level matters has occurred. I
would think all of those would be issues of great concern to a CA, if for
no other reason than that they should be very very rare.
dev-security-policy mailing list