Sadanand Shenoy created HDDS-9420:
-------------------------------------

             Summary: Enabling GRPC encryption causes SCM startup failure.  
                 Key: HDDS-9420
                 URL: https://issues.apache.org/jira/browse/HDDS-9420
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Sadanand Shenoy


HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In 
SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it 
tries to load the keyStoresFactory
{code:java}
if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
  KeyStoresFactory serverKeyFactory =
      certificateClient.getServerKeyStoresFactory(); {code}
This in turn calls loadKeyManager which tries to load the entire trust chainĀ 
{code:java}
private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
    throws GeneralSecurityException, IOException {
  PrivateKey privateKey = caClient.getPrivateKey();
  List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
Loading the entire trust chain does a listCA call which is network call to 
SCMSecurityProtocolServer
{code:java}
public List<String> updateCAList() throws IOException {
  pemEncodedCACertsLock.lock();
  try {
    pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
All of this happens inside the StorageContainerManager constructor but the 
services in SCM are started only after constructor is initialised and 
scm.start() is called which means it is sending a request to security server 
before it is even started thus leading to connection refused messages in SCM 
startup like below,


{code:java}
10:45:45.506 AM             INFO      SCMRatisServerImpl starting Raft server 
for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
10:45:47.563 AM             INFO      RetryInvocationHandler 
com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
<HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
$Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after 
1 failover attempts. Trying to failover after sleeping for 2000ms.
10:45:49.565 AM             INFO      RetryInvocationHandler 
com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
<HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
$Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after 
2 failover attempts. Trying to failover after sleeping for 2000ms.
(repeated) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to