[
https://issues.apache.org/jira/browse/HDDS-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaoyu Yao resolved HDDS-5078.
------------------------------
Fix Version/s: 1.2.0
Resolution: Fixed
Thanks [~bharat] for the contribution. PR has been merged.
> [SCM HA Security] NPE during secure SCM initialization with HA code updated
> to an already existing cluster
> ----------------------------------------------------------------------------------------------------------
>
> Key: HDDS-5078
> URL: https://issues.apache.org/jira/browse/HDDS-5078
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM HA
> Reporter: István Fajth
> Assignee: Bharat Viswanadham
> Priority: Blocker
> Fix For: 1.2.0
>
>
> On a Cloudera Manager managed cluster, scm is started always with --init
> option specified, and this behaviour revealed the following null pointer
> dereference:
> StorageContainerManager#initializeCertificateClient initializes the
> scmCertificateClient only if scmStorageConfig#checkPrimarySCMIdInitialized()
> evaluates to true. This evaluates to true, if the VERSION file contains
> primaryScmNodeId with a value.
> If you upgrade an existing cluster with a single SCM to this code, the
> VERSION file does not contain a primaryScmNodeId, so the scmCertificateClient
> remains null.
> Later the initialization code calls the
> StorageContainerManager#initializeCAnSecurityProtocol method, which at the
> end creates the securityProtocolServer, for the constructor call the
> rootCACert is provided by calling the scmCertificateClient#getCACertificate
> method, but this is a null dereference as scmCertificateClient is null.
> The scmCertificateClient being null, can cause problems later as well, as it
> is used multiple times unconditionally.
> Later on after working around this particular problem (by simply let the code
> create the scmCertificateClient without conditions), it turned out that in
> the StorageContainerManager#initializeCAnSecurityProtocol call the
> scmCertificateServer and the rootCertificateServer instances are also remain
> uninitialized, with that causing problems when an scm client tries to get the
> root CA certificate from the SCM.
> For me this suggests that initialization of SCM fails after an upgrade on an
> old cluster, this was working fine before, and --init did not reinitialized
> anything, but worked fine.
> If I change Cloudera Manager behaviour to do not init the SCM when I start
> it, I still get the same NPE as with --init from the SCM.
> The exception I get in the SCM log is as follows, the command I issue is a
> recommission of a formerly (before upgrade) decommissioned DN.
> {code}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMGetCertResponseProto$Builder.setX509RootCACertificate(SCMSecurityProtocolProtos.java:9026)
> at
> org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.getCACertificate(SCMSecurityProtocolServerSideTranslatorPB.java:257)
> at
> org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:104)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.submitRequest(SCMSecurityProtocolServerSideTranslatorPB.java:89)
> at
> org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMSecurityProtocolService$2.callBlockingMethod(SCMSecurityProtocolProtos.java:10537)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:986)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:914)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2887)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]