[ 
https://issues.apache.org/jira/browse/HDDS-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746302#comment-17746302
 ] 

István Fajth commented on HDDS-9061:
------------------------------------

So what happens here is that CertificateClient init determines that a REINIT is 
needed (introduced by HDDS-7453), while the new certificate renewer service 
also tries to renew the certificate.
The renewer service fails with an interrupted exception in the client level 
which is interesting as it happens instantaneously, but this is because the 
reinit on the other hand closes the client, which shuts down the executor.
After this, the main thread fails to get a certificate signed, because there is 
no quorum in the raft of SCMs... As I see there was an SCM decommission on 
node8, and one was added on node6, but due to the race condition at restart in 
services, the SCM quorum never left the safe mode, with that the certificate 
are not getting signed, and OMs do not leave the init phase due to infinite 
retries.

The solution is to remove what we have added in HDDS-7453, as it became 
obsolete with the new certificate renewer background job.

> [ozone-cert-rotation] Certificates for few components were renewed, but not 
> for all
> -----------------------------------------------------------------------------------
>
>                 Key: HDDS-9061
>                 URL: https://issues.apache.org/jira/browse/HDDS-9061
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Soumitra Sulav
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: quasar-kkgvze-1-datanode.log, 
> quasar-kkgvze-1-ozone-scm.log, quasar-kkgvze-2-datanode.log, 
> quasar-kkgvze-2-ozone-scm.log, quasar-kkgvze-6-ozone-scm.log
>
>
> Current config set under safety valve for ca cert rotation :
> {code:java}
> <property><name>hdds.x509.max.duration</name><value>P5D</value></property>
> <property><name>hdds.x509.default.duration</name><value>PT1H</value></property>
> <property><name>hdds.x509.renew.grace.duration</name><value>PT50M</value></property>
> <property><name>hdds.block.token.expiry.time</name><value>15m</value></property>
> <property><name>ozone.manager.delegation.token.renew-interval</name><value>15m</value></property>
> <property><name>ozone.manager.delegation.token.max-lifetime</name><value>30m</value></property>
> <property><name>hdds.x509.ca.rotation.check.interval</name><value>PT30M</value></property>
> {code}
> based on above config the certificates should renew at 10th minutes from 
> start by the grace period coming into play.
> Observation :
> Certificates were renewed for a few roles or instances at random.
> Primordial SCM was stopped and started at the below times :
> 2023-07-21 15:36:03,703 : STOPPED
> 2023-07-21 16:00:25,736 : STARTED
> {code:java}
> [root@quasar-kkgvze-1 ~]# ozone admin cert list | grep "dn@"
> 66603508550073199 Fri Jul 21 15:02:32 UTC 2023   Fri Jul 21 16:02:32 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509087676398 Fri Jul 21 15:02:33 UTC 2023   Fri Jul 21 16:02:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509333666132 Fri Jul 21 15:02:33 UTC 2023   Fri Jul 21 16:02:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509791941467 Fri Jul 21 15:02:33 UTC 2023   Fri Jul 21 16:02:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510052147817 Fri Jul 21 15:02:34 UTC 2023   Fri Jul 21 16:02:34 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510180218825 Fri Jul 21 15:02:34 UTC 2023   Fri Jul 21 16:02:34 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510614053924 Fri Jul 21 15:02:34 UTC 2023   Fri Jul 21 16:02:34 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603512588421508 Fri Jul 21 15:02:36 UTC 2023   Fri Jul 21 16:02:36 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604108035744075 Fri Jul 21 15:12:32 UTC 2023   Fri Jul 21 16:12:32 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109383514945 Fri Jul 21 15:12:33 UTC 2023   Fri Jul 21 16:12:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109450607224 Fri Jul 21 15:12:33 UTC 2023   Fri Jul 21 16:12:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109820303351 Fri Jul 21 15:12:33 UTC 2023   Fri Jul 21 16:12:33 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604110216800118 Fri Jul 21 15:12:34 UTC 2023   Fri Jul 21 16:12:34 UTC 2023 
>   
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>  
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to