bharatviswa504 opened a new pull request #2312: URL: https://github.com/apache/ozone/pull/2312
## What changes were proposed in this pull request? On SCM check if it is SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a RetriableWithFailOverException. In this way, FailOverProxyProvider performs failOver and Retry to the next SCM. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-5317 ## How was this patch tested? Tested manually on docker-compose where changed the order of node ids to scm2,scm3,scm1 And started SCM3, so it will connect to scm2, and see whether it is able to bootstrap or not. SCM3 connected to SCM2 and it is throwing RetriableWithFailOverException. ``` scm2.org_1 | org.apache.hadoop.hdds.scm.ha.RetriableWithFailOverException: org.apache.hadoop.hdds.security.exception.SCMSecurityException: Get SCM Certificate can be run only primary SCM scm2.org_1 | at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:206) scm2.org_1 | at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:157) scm2.org_1 | at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87) scm2.org_1 | at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.submitRequest(SCMSecurityProtocolServerSideTranslatorPB.java:97) scm2.org_1 | at org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMSecurityProtocolService$2.callBlockingMethod(SCMSecurityProtocolProtos.java:15124) scm2.org_1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) scm2.org_1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) scm2.org_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) scm2.org_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) scm2.org_1 | at java.base/java.security.AccessController.doPrivileged(Native Method) scm2.org_1 | at java.base/javax.security.auth.Subject.doAs(Subject.java:423) scm2.org_1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) scm2.org_1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) scm2.org_1 | Caused by: org.apache.hadoop.hdds.security.exception.SCMSecurityException: Get SCM Certificate can be run only primary SCM scm2.org_1 | at org.apache.hadoop.hdds.scm.server.SCMSecurityProtocolServer.getSCMCertificate(SCMSecurityProtocolServer.java:200) scm2.org_1 | at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.getSCMCertificate(SCMSecurityProtocolServerSideTranslatorPB.java:228) scm2.org_1 | at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:127) scm2.org_1 | ... 11 more ``` SCM3 bootstrap is successful. ``` scm3.org_1 | 2021-06-08 08:11:53,076 [main] INFO server.StorageContainerManager: SCM BootStrap is successful for ClusterID CID-74d4b242-a5d7-4b07-8677-f75f0207c0e8, SCMID d7a4c94b-423a-45ae-b04a-9474584206d1 scm3.org_1 | 2021-06-08 08:11:53,076 [main] INFO server.StorageContainerManager: Primary SCM Node ID 4f54d4de-8942-47b0-a88e-99e5d1bbcad7 scm3.org_1 | 2021-06-08 08:11:53,086 [shutdown-hook-0] INFO server.StorageContainerManagerStarter: SHUTDOWN_MSG: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
