[ 
https://issues.apache.org/jira/browse/HDDS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

István Fajth updated HDDS-7391:
-------------------------------
    Description: 
The current rootCA certificate expiration happens in somewhat over 5 years 
after the certificate was created.
This event invalidates all certificates that are signed in the trust chain for 
which the rootCA certificate is the base of trust, this means that rotation and 
renewal of this certificate is time consuming at once, as it includes the 
renewal of all certificates.

In order to renew the rootCA certificate, instead of a full security 
re-bootstrap we would like to follow the following procedure:
- before the rootCA certificate expires, we create a new rootCA certificate
- with the new rootCA certificate we rotate the sub-CA certificate of all 3 SCMs
- once that is done, we make the new rootCA certificate available for other 
services via an SCM API
- other services are starting to poll for the new rootCA certificate at a time 
when it is most likely already generated and available via the SCM API
- once the new rootCA certificate is present, services update their TrustStores 
and after a random delay that leaves room for most if not all of the other 
services to refresh their TrustStores, every service renews it own certificate 
regardless of expiration, and gets a new certificate signed by the new sub-CA 
certificate of the leader.

During this process the start for polling the rootCA certificate happens around 
the same time, but this is a short request and the response payload is the 
rootCA certificate only, so SCM might experience a short peak here so we might 
want to introduce a jitter for this if necessary.

During this process the issuance of new certificates is a resource intensive 
task on the leader SCM, so we definitely want to introduce a jitter in that, a 
configurable one, in order to be able to shorten this period for testing.

More information on the failure scenarios and the whole process can be found in 
the attached pdf document.


  was:
The current rootCA certificate expiration happens in somewhat over 5 years 
after the certificate was created.
This event invalidates all certificates that are signed in the trust chain for 
which the rootCA certificate is the base of trust, this means that rotation and 
renewal of this certificate is time consuming at once, as it includes the 
renewal of all certificates.

In order to renew the rootCA certificate, instead of a full security 
re-bootstrap we would like to follow the following procedure:
- before any of the certificates starts to have an expiration date bigger then 
the rootCA expiration date, we need to create a new rootCA certificate and we 
need to start using that as the root of trust for new certificates
- in the time period while the old rootCA certificate is still valid, we need 
to ensure that both rootCA certificate is distributed to the trust stores
- creating the new rootCA certificate has to happen prior to the renewal of any 
subordinate CA certificates.
- creating the new rootCA certificate should trigger the rotation of all 
subordinate CA certificate active in the system, and the new subordinate CA 
certificates has to be signed by the rootCA certificate.

Notes:
Let's see an example of how this may happen:
- let's say we have regular certificates valid for n-days in our system, this 
is defined by configuration
- n+2 days before the rootCA certificate expiration date, we can only have 
subordinate CA certificates that are expiring in n+2 or more days (rootCA and 
subordinate CA certificates have the same expiration period)
- every certificate is renewed on the day before the day when the certificate 
expires
On the day n+2 days before the rootCA certificate expiration:
- we create the new rootCA certificate, and refresh the trust stores in the 
system to contain both the old and new rootCA certificate
- we create the new subordinate CA certificates, and reset the CA server 
subsystem in SCMs to use the new subordinate CA certificate for certificate 
signature
- any new CSR is signed by the new subordinate CA certificates from this point 
on
This ensures that all the certificates to be renewed are renewed as part of the 
new chain of trust for which the rootCA certificate is the new one, while the 
certificates that are inheriting trust from the old rootCA certificate are 
still trusted.

On the day when the old rootCA certificate expires we do not need to do 
anything, but remove the old rootCA certificate from the SCM's metadata, as:
- every certificate inheriting the trust from that is already renewed, or will 
be renewed at startup of a service that was not live at renewal time 
immediately. 
- all certificates that may still linger there is untrusted as the rootCA 
certificate expired.
- truststores are built up at startup, so they will eventually forget the old 
rootCA once they are restarted



> Automated live rotation of CA certificates in a cluster with established trust
> ------------------------------------------------------------------------------
>
>                 Key: HDDS-7391
>                 URL: https://issues.apache.org/jira/browse/HDDS-7391
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Security
>            Reporter: István Fajth
>            Assignee: István Fajth
>            Priority: Blocker
>              Labels: pki
>         Attachments: CA_cert_rotation_design.pdf
>
>
> The current rootCA certificate expiration happens in somewhat over 5 years 
> after the certificate was created.
> This event invalidates all certificates that are signed in the trust chain 
> for which the rootCA certificate is the base of trust, this means that 
> rotation and renewal of this certificate is time consuming at once, as it 
> includes the renewal of all certificates.
> In order to renew the rootCA certificate, instead of a full security 
> re-bootstrap we would like to follow the following procedure:
> - before the rootCA certificate expires, we create a new rootCA certificate
> - with the new rootCA certificate we rotate the sub-CA certificate of all 3 
> SCMs
> - once that is done, we make the new rootCA certificate available for other 
> services via an SCM API
> - other services are starting to poll for the new rootCA certificate at a 
> time when it is most likely already generated and available via the SCM API
> - once the new rootCA certificate is present, services update their 
> TrustStores and after a random delay that leaves room for most if not all of 
> the other services to refresh their TrustStores, every service renews it own 
> certificate regardless of expiration, and gets a new certificate signed by 
> the new sub-CA certificate of the leader.
> During this process the start for polling the rootCA certificate happens 
> around the same time, but this is a short request and the response payload is 
> the rootCA certificate only, so SCM might experience a short peak here so we 
> might want to introduce a jitter for this if necessary.
> During this process the issuance of new certificates is a resource intensive 
> task on the leader SCM, so we definitely want to introduce a jitter in that, 
> a configurable one, in order to be able to shorten this period for testing.
> More information on the failure scenarios and the whole process can be found 
> in the attached pdf document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to