bharatviswa504 opened a new pull request #2691: URL: https://github.com/apache/ozone/pull/2691
## What changes were proposed in this pull request? On a upgraded cluster with SCM HA version code, SCM fails to start when hdds.container.token.enabled is set to true. In a upgraded cluster and SCM non-HA SCMCertificateClient is not initialized and sub-CA is not started. Initialize SCMCertificateClient with RootCA Cert and initialize ContainerTokenManager. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-5789 ## How was this patch tested? Manually Verified this on a cluster. **Before fix:** ``` 2021-09-28 22:28:01,554 ERROR org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start failed with exception java.lang.NullPointerException at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createContainerTokenSecretManager(StorageContainerManager.java:726) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeCAnSecurityProtocol(StorageContainerManager.java:674) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:337) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:460) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:472) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:165) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:139) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:68) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1933) at picocli.CommandLine.access$1100(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332) at picocli.CommandLine$RunLast.handle(CommandLine.java:2326) at picocli.CommandLine$RunLast.handle(CommandLine.java:2291) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152) at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530) at picocli.CommandLine.parseWithHandler(CommandLine.java:2465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:57) 2021-09-28 22:28:01,604 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG: ``` **After fix:** ``` 2021-09-28 22:29:41,598 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,599 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for recon/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,600 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,600 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for om/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,626 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for om/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol 2021-09-28 22:29:41,626 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol 2021-09-28 22:29:41,631 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol 2021-09-28 22:29:41,631 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for om/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol 2021-09-28 22:29:41,638 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for recon/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol 2021-09-28 22:29:41,664 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,676 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol 2021-09-28 22:29:41,757 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/[email protected] (auth:KERBEROS) 2021-09-28 22:29:41,766 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol 2021-09-28 22:29:41,926 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor 2021-09-28 22:29:41,941 WARN org.apache.hadoop.hdds.server.http.BaseHttpServer: SSL config ssl.server.truststore.location is missing. If ozone.https.server.keystore.resource is specified, make sure it is a relative path 2021-09-28 22:29:42,358 INFO org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl: Added a new node: /default/1d2fa315-f276-4ab5-9c38-7875ac0eaf95 2021-09-28 22:29:42,359 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: Registered Data node : 1d2fa315-f276-4ab5-9c38-7875ac0eaf95{ip: 172.27.27.129, host: quasar-afcevv-5.quasar-afcevv.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, STANDALONE=9859], networkLocation: /default, certSerialId: 13326388815505992, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} 2021-09-28 22:29:42,361 INFO org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: trigger a one-shot run on RatisPipelineUtilsThread. 2021-09-28 22:29:42,365 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: ContainerSafeModeRule rule is successfully validated 2021-09-28 22:29:42,365 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 1 DataNodes registered, 1 required. 2021-09-28 22:29:42,366 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: DataNodeSafeModeRule rule is successfully validated 2021-09-28 22:29:42,367 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: All SCM safe mode pre check rules have passed 2021-09-28 22:29:42,367 WARN org.apache.hadoop.hdds.server.events.EventQueue: No event handler registered for event TypedEvent{payloadType=SafeModeStatus, name='Safe mode status'} 2021-09-28 22:29:42,368 INFO org.apache.hadoop.hdds.scm.ha.SCMContext: Update SafeModeStatus from SafeModeStatus{safeModeStatus=true, preCheckPassed=false} to SafeModeStatus{safeModeStatus=true, preCheckPassed=true}. 2021-09-28 22:29:42,369 INFO org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: trigger a one-shot run on RatisPipelineUtilsThread. 2021-09-28 22:29:42,371 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. Pipelines with at least one datanode reported count is 2, required at least one datanode reported per pipeline count is 2 2021-09-28 22:29:42,372 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: AtleastOneDatanodeReportedRule rule is successfully validated 2021-09-28 22:29:42,373 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. Healthy pipelines reported count is 0, required healthy pipeline reported count is 1 2021-09-28 22:29:42,395 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for om/[email protected] (auth:KERBEROS) 2021-09-28 22:29:42,405 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for om/[email protected] (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol 2021-09-28 22:29:42,551 INFO org.apache.hadoop.hdds.server.http.BaseHttpServer: Starting Web-server for scm at: https://0.0.0.0:9877 ``` **Tested basic write/get** ``` root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER# ozone sh volume create /vol1 root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER# ozone sh bucket create /vol1/buck1 root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER# ozone sh key put /vol1/buck1/key1 /etc/hadoop/conf/ozone-site.xml root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER# ozone sh key get /vol1/buck1/key1 /tmp/dkey1 root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER# cat /tmp/dkey1 <?xml version="1.0" encoding="UTF-8"?> ``` <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>ozone.scm.names</name> <value>xxx</value> </property> <property> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
