bharatviswa504 opened a new pull request #2691:
URL: https://github.com/apache/ozone/pull/2691


   ## What changes were proposed in this pull request?
   
   On a upgraded cluster with SCM HA version code, SCM fails to start when 
hdds.container.token.enabled is set to true.
   In a upgraded cluster and SCM non-HA SCMCertificateClient is not initialized 
and sub-CA is not started. Initialize SCMCertificateClient with RootCA Cert and 
initialize ContainerTokenManager.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5789
   
   ## How was this patch tested?
   
   Manually Verified this on a cluster.
   
   **Before fix:**
   ```
   2021-09-28 22:28:01,554 ERROR 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start 
failed with exception
   java.lang.NullPointerException
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createContainerTokenSecretManager(StorageContainerManager.java:726)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeCAnSecurityProtocol(StorageContainerManager.java:674)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:337)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:460)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:472)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:165)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:139)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:68)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:44)
           at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
           at picocli.CommandLine.access$1100(CommandLine.java:145)
           at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
           at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
           at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
           at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
           at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
           at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:57)
   2021-09-28 22:28:01,604 INFO 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG:
   ```
   
   **After fix:**
   ```
   2021-09-28 22:29:41,598 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
dn/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,599 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
recon/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,600 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
dn/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,600 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
om/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,626 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
om/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol
   2021-09-28 22:29:41,626 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
dn/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
   2021-09-28 22:29:41,631 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
dn/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
   2021-09-28 22:29:41,631 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
om/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol
   2021-09-28 22:29:41,638 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
recon/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol
   2021-09-28 22:29:41,664 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
dn/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,676 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
dn/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
   2021-09-28 22:29:41,757 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
dn/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:41,766 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
dn/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
   2021-09-28 22:29:41,926 INFO org.apache.hadoop.util.JvmPauseMonitor: 
Starting JVM pause monitor
   2021-09-28 22:29:41,941 WARN 
org.apache.hadoop.hdds.server.http.BaseHttpServer: SSL config 
ssl.server.truststore.location is missing. If 
ozone.https.server.keystore.resource is specified, make sure it is a relative 
path
   2021-09-28 22:29:42,358 INFO 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl: Added a new node: 
/default/1d2fa315-f276-4ab5-9c38-7875ac0eaf95
   2021-09-28 22:29:42,359 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
Registered Data node : 1d2fa315-f276-4ab5-9c38-7875ac0eaf95{ip: 172.27.27.129, 
host: quasar-afcevv-5.quasar-afcevv.root.hwx.site, ports: [REPLICATION=9886, 
RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, STANDALONE=9859], 
networkLocation: /default, certSerialId: 13326388815505992, persistedOpState: 
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
   2021-09-28 22:29:42,361 INFO 
org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: trigger a 
one-shot run on RatisPipelineUtilsThread.
   2021-09-28 22:29:42,365 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: ContainerSafeModeRule 
rule is successfully validated
   2021-09-28 22:29:42,365 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 1 
DataNodes registered, 1 required.
   2021-09-28 22:29:42,366 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: DataNodeSafeModeRule 
rule is successfully validated
   2021-09-28 22:29:42,367 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: All SCM safe mode pre 
check rules have passed
   2021-09-28 22:29:42,367 WARN 
org.apache.hadoop.hdds.server.events.EventQueue: No event handler registered 
for event TypedEvent{payloadType=SafeModeStatus, name='Safe mode status'}
   2021-09-28 22:29:42,368 INFO org.apache.hadoop.hdds.scm.ha.SCMContext: 
Update SafeModeStatus from SafeModeStatus{safeModeStatus=true, 
preCheckPassed=false} to SafeModeStatus{safeModeStatus=true, 
preCheckPassed=true}.
   2021-09-28 22:29:42,369 INFO 
org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: trigger a 
one-shot run on RatisPipelineUtilsThread.
   2021-09-28 22:29:42,371 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 
Pipelines with at least one datanode reported count is 2, required at least one 
datanode reported per pipeline count is 2
   2021-09-28 22:29:42,372 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: 
AtleastOneDatanodeReportedRule rule is successfully validated
   2021-09-28 22:29:42,373 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 
Healthy pipelines reported count is 0, required healthy pipeline reported count 
is 1
   2021-09-28 22:29:42,395 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
Auth successful for 
om/[email protected] 
(auth:KERBEROS)
   2021-09-28 22:29:42,405 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for 
om/[email protected] 
(auth:KERBEROS) for protocol=interface 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol
   2021-09-28 22:29:42,551 INFO 
org.apache.hadoop.hdds.server.http.BaseHttpServer: Starting Web-server for scm 
at: https://0.0.0.0:9877
   ```
   
   **Tested basic write/get**
   ```
   
   
root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER#
 ozone sh volume create /vol1
   
root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER#
 ozone sh bucket create /vol1/buck1
   
root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER#
 ozone sh key put /vol1/buck1/key1 /etc/hadoop/conf/ozone-site.xml 
   
root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER#
 ozone sh key get /vol1/buck1/key1 /tmp/dkey1
   
root@quasar-afcevv-4:/var/run/cloudera-scm-agent/process/1546338751-ozone-OZONE_MANAGER#
 cat /tmp/dkey1
   <?xml version="1.0" encoding="UTF-8"?>
   ```
   
   <!--Autogenerated by Cloudera Manager-->
   <configuration>
     <property>
       <name>ozone.scm.names</name>
       <value>xxx</value>
     </property>
     <property>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to