[
https://issues.apache.org/jira/browse/HDDS-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773613#comment-17773613
]
Attila Doroszlai commented on HDDS-9420:
----------------------------------------
[~sadanand_shenoy], does this happen when starting from scratch, or only
during/after upgrade from previous version? For the latter HDDS-9410 is
already reported. We cover the former with secure acceptance tests (both HA
and non-HA), which have {{hdds.grpc.tls.enabled=true}} -- are they missing some
test case?
> Enabling GRPC encryption causes SCM startup failure.
> ------------------------------------------------------
>
> Key: HDDS-9420
> URL: https://issues.apache.org/jira/browse/HDDS-9420
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sadanand Shenoy
> Priority: Major
>
> HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In
> SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it
> tries to load the keyStoresFactory
> {code:java}
> if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
> KeyStoresFactory serverKeyFactory =
> certificateClient.getServerKeyStoresFactory(); {code}
> This in turn calls loadKeyManager which tries to load the entire trust chainĀ
> {code:java}
> private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
> throws GeneralSecurityException, IOException {
> PrivateKey privateKey = caClient.getPrivateKey();
> List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
> Loading the entire trust chain does a listCA call which is network call to
> SCMSecurityProtocolServer
> {code:java}
> public List<String> updateCAList() throws IOException {
> pemEncodedCACertsLock.lock();
> try {
> pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
> All of this happens inside the StorageContainerManager constructor but the
> services in SCM are started only after constructor is initialised and
> scm.start() is called which means it is sending a request to security server
> before it is even started thus leading to connection refused messages in SCM
> startup like below,
> {code:java}
> 10:45:45.506 AM INFO SCMRatisServerImpl starting Raft server
> for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
> 10:45:47.563 AM INFO RetryInvocationHandler
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961
> after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> 10:45:49.565 AM INFO RetryInvocationHandler
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961
> after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> (repeated) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]