Neil Joshi created HDDS-7985:
--------------------------------
Summary: [SCM HA] On SCM Disk failure recovery causes Datanode
Failure on startup
Key: HDDS-7985
URL: https://issues.apache.org/jira/browse/HDDS-7985
Project: Apache Ozone
Issue Type: Bug
Reporter: Neil Joshi
Recovery from an SCM disk failure when no backup is avail requires,
* Clean _ozone.scm.db.dirs_ __ and __ _ozone.metadata.dirs_ locations
and bootstrapping the SCM. Whether SCM is primodial or not an error occurs
when recovering from a failed disk with no backup when starting a datanode
after SCM recovery.
Datanodes brought up after SCM disk failure recovery are unable to start due to
a CA certificate error observed, stating the number of certificates received
from the SCM is greater than the number expected:
{code:java}
ozonesecure-ha-datanode1-1 | 2023-02-17 00:46:40 INFO HAUtils:457 - Expected
CA list size 4, where as received CA List size 5.{code}
In this case when listing the certificates stored by the SCM, it reports a
total of 5 scm certificates after SCM2 recovers from disk failure:
{code:java}
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
{code}
It appears to have 2 entries for SCM 2 (the scm disk failure recovery node)
$ ozone admin certs list
bash-4.2$ ozone admin cert list
{code:java}
Total 12 valid certificates:
SerialNumber Valid From Expiry
Subject
1 Fri Feb 17 00:00:00 UTC 2023 Mon Mar 27 00:00:00 UTC 2028
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, [email protected]
10760186198072 Fri Feb 17 00:00:00 UTC 2023 Mon Mar 27 00:00:00 UTC 2028
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, [email protected]
10779888473070 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=recon@recon
10780166036417 Fri Feb 17 00:00:00 UTC 2023 Mon Mar 27 00:00:00 UTC 2028
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f99f1a81-7cce-44c9-a09b-9f7bbc48b6ac, [email protected]
10788394717480 Fri Feb 17 00:00:00 UTC 2023 Mon Mar 27 00:00:00 UTC 2028
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=598be6bc-7d86-4cab-84dc-668a162a7ec2, [email protected]
10800769855768 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@bd3138308a3f
10801305457014 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@e4795cc77124
10801871334038 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@3eb28ff965a1
10803980992569 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om2
10804543987939 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om3
10806118720884 Fri Feb 17 00:00:00 UTC 2023 Sat Feb 17 00:00:00 UTC 2024
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om1
10932809284268 Fri Feb 17 00:00:00 UTC 2023 Mon Mar 27 00:00:00 UTC 2028
O=CID-abb46225-77ba-4132-ac6e-96792b40450c,
OU=b4a175f3-c6a4-47fd-bcc5-c081b03de8c7, [email protected] {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]