Sadanand Shenoy created HDDS-9420:
-------------------------------------
Summary: Enabling GRPC encryption causes SCM startup failure.
Key: HDDS-9420
URL: https://issues.apache.org/jira/browse/HDDS-9420
Project: Apache Ozone
Issue Type: Bug
Reporter: Sadanand Shenoy
HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In
SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it
tries to load the keyStoresFactory
{code:java}
if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
KeyStoresFactory serverKeyFactory =
certificateClient.getServerKeyStoresFactory(); {code}
This in turn calls loadKeyManager which tries to load the entire trust chainĀ
{code:java}
private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
throws GeneralSecurityException, IOException {
PrivateKey privateKey = caClient.getPrivateKey();
List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
Loading the entire trust chain does a listCA call which is network call to
SCMSecurityProtocolServer
{code:java}
public List<String> updateCAList() throws IOException {
pemEncodedCACertsLock.lock();
try {
pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
All of this happens inside the StorageContainerManager constructor but the
services in SCM are started only after constructor is initialised and
scm.start() is called which means it is sending a request to security server
before it is even started thus leading to connection refused messages in SCM
startup like below,
{code:java}
10:45:45.506 AM INFO SCMRatisServerImpl starting Raft server
for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
10:45:47.563 AM INFO RetryInvocationHandler
com.google.protobuf.ServiceException: java.net.ConnectException: Call From
<HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
$Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after
1 failover attempts. Trying to failover after sleeping for 2000ms.
10:45:49.565 AM INFO RetryInvocationHandler
com.google.protobuf.ServiceException: java.net.ConnectException: Call From
<HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
$Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after
2 failover attempts. Trying to failover after sleeping for 2000ms.
(repeated) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]