[
https://issues.apache.org/jira/browse/HDDS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
István Fajth updated HDDS-7391:
-------------------------------
Description:
The current rootCA certificate expiration happens in somewhat over 5 years
after the certificate was created.
This event invalidates all certificates that are signed in the trust chain for
which the rootCA certificate is the base of trust, this means that rotation and
renewal of this certificate is time consuming at once, as it includes the
renewal of all certificates.
In order to renew the rootCA certificate, instead of a full security
re-bootstrap we would like to follow the following procedure:
- before the rootCA certificate expires, we create a new rootCA certificate
- with the new rootCA certificate we rotate the sub-CA certificate of all 3 SCMs
- once that is done, we make the new rootCA certificate available for other
services via an SCM API
- other services are starting to poll for the new rootCA certificate at a time
when it is most likely already generated and available via the SCM API
- once the new rootCA certificate is present, services update their TrustStores
and after a random delay that leaves room for most if not all of the other
services to refresh their TrustStores, every service renews it own certificate
regardless of expiration, and gets a new certificate signed by the new sub-CA
certificate of the leader.
During this process the start for polling the rootCA certificate happens around
the same time, but this is a short request and the response payload is the
rootCA certificate only, so SCM might experience a short peak here so we might
want to introduce a jitter for this if necessary.
During this process the issuance of new certificates is a resource intensive
task on the leader SCM, so we definitely want to introduce a jitter in that, a
configurable one, in order to be able to shorten this period for testing.
More information on the failure scenarios and the whole process can be found in
the attached pdf document.
was:
The current rootCA certificate expiration happens in somewhat over 5 years
after the certificate was created.
This event invalidates all certificates that are signed in the trust chain for
which the rootCA certificate is the base of trust, this means that rotation and
renewal of this certificate is time consuming at once, as it includes the
renewal of all certificates.
In order to renew the rootCA certificate, instead of a full security
re-bootstrap we would like to follow the following procedure:
- before any of the certificates starts to have an expiration date bigger then
the rootCA expiration date, we need to create a new rootCA certificate and we
need to start using that as the root of trust for new certificates
- in the time period while the old rootCA certificate is still valid, we need
to ensure that both rootCA certificate is distributed to the trust stores
- creating the new rootCA certificate has to happen prior to the renewal of any
subordinate CA certificates.
- creating the new rootCA certificate should trigger the rotation of all
subordinate CA certificate active in the system, and the new subordinate CA
certificates has to be signed by the rootCA certificate.
Notes:
Let's see an example of how this may happen:
- let's say we have regular certificates valid for n-days in our system, this
is defined by configuration
- n+2 days before the rootCA certificate expiration date, we can only have
subordinate CA certificates that are expiring in n+2 or more days (rootCA and
subordinate CA certificates have the same expiration period)
- every certificate is renewed on the day before the day when the certificate
expires
On the day n+2 days before the rootCA certificate expiration:
- we create the new rootCA certificate, and refresh the trust stores in the
system to contain both the old and new rootCA certificate
- we create the new subordinate CA certificates, and reset the CA server
subsystem in SCMs to use the new subordinate CA certificate for certificate
signature
- any new CSR is signed by the new subordinate CA certificates from this point
on
This ensures that all the certificates to be renewed are renewed as part of the
new chain of trust for which the rootCA certificate is the new one, while the
certificates that are inheriting trust from the old rootCA certificate are
still trusted.
On the day when the old rootCA certificate expires we do not need to do
anything, but remove the old rootCA certificate from the SCM's metadata, as:
- every certificate inheriting the trust from that is already renewed, or will
be renewed at startup of a service that was not live at renewal time
immediately.
- all certificates that may still linger there is untrusted as the rootCA
certificate expired.
- truststores are built up at startup, so they will eventually forget the old
rootCA once they are restarted
> Automated live rotation of CA certificates in a cluster with established trust
> ------------------------------------------------------------------------------
>
> Key: HDDS-7391
> URL: https://issues.apache.org/jira/browse/HDDS-7391
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Security
> Reporter: István Fajth
> Assignee: István Fajth
> Priority: Blocker
> Labels: pki
> Attachments: CA_cert_rotation_design.pdf
>
>
> The current rootCA certificate expiration happens in somewhat over 5 years
> after the certificate was created.
> This event invalidates all certificates that are signed in the trust chain
> for which the rootCA certificate is the base of trust, this means that
> rotation and renewal of this certificate is time consuming at once, as it
> includes the renewal of all certificates.
> In order to renew the rootCA certificate, instead of a full security
> re-bootstrap we would like to follow the following procedure:
> - before the rootCA certificate expires, we create a new rootCA certificate
> - with the new rootCA certificate we rotate the sub-CA certificate of all 3
> SCMs
> - once that is done, we make the new rootCA certificate available for other
> services via an SCM API
> - other services are starting to poll for the new rootCA certificate at a
> time when it is most likely already generated and available via the SCM API
> - once the new rootCA certificate is present, services update their
> TrustStores and after a random delay that leaves room for most if not all of
> the other services to refresh their TrustStores, every service renews it own
> certificate regardless of expiration, and gets a new certificate signed by
> the new sub-CA certificate of the leader.
> During this process the start for polling the rootCA certificate happens
> around the same time, but this is a short request and the response payload is
> the rootCA certificate only, so SCM might experience a short peak here so we
> might want to introduce a jitter for this if necessary.
> During this process the issuance of new certificates is a resource intensive
> task on the leader SCM, so we definitely want to introduce a jitter in that,
> a configurable one, in order to be able to shorten this period for testing.
> More information on the failure scenarios and the whole process can be found
> in the attached pdf document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]