[ 
https://issues.apache.org/jira/browse/HDDS-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776727#comment-17776727
 ] 

István Fajth commented on HDDS-9420:
------------------------------------

The evaluation is right, I have posted HDDS-9410 for exactly the same flow.

I would propose to close HDDS-9410 as duplicate as here we have way more 
detailed context, than I did in the other one.

As a solution, we should change the certificate merge logic in a way that it 
does not want to include the rootCA certificate in the certificate chain.
The reasoning for this is based on the RFC-5246 (TLS v1.2) the certificate 
bundle is a list of certificates that is defined as:
{quote}
   certificate_list
      This is a sequence (chain) of certificates.  The sender's
      certificate MUST come first in the list.  Each following
      certificate MUST directly certify the one preceding it.  Because
      certificate validation requires that root keys be distributed
      independently, the self-signed certificate that specifies the root
      certificate authority MAY be omitted from the chain, under the
      assumption that the remote end must already possess it in order to
      validate it in any case.
{quote}

In our system that assumption about the rootCA should be valid, and every 
component and the client should have the rootCA in their respective 
TrustStores, so we do not need that at the end of the list in the PEM files.

Also openssl verification would throw a warn on such certificate chains, as 
there is a self-signed certificate in the chain, that is not signed by any 
Certificate Authority that is in the default certificate authority bundle.

TLS 1.3 is defining the list without the mention of self signed certificates, 
but it eases the ordering requirement of 1.2, and does not introduce other 
changes, so this kind of certificate bundles work with TLS1.2, and 1.3.

> Enabling GRPC encryption causes SCM startup failure.  
> ------------------------------------------------------
>
>                 Key: HDDS-9420
>                 URL: https://issues.apache.org/jira/browse/HDDS-9420
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sadanand Shenoy
>            Assignee: Sammi Chen
>            Priority: Critical
>
> HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In 
> SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it 
> tries to load the keyStoresFactory
> {code:java}
> if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
>   KeyStoresFactory serverKeyFactory =
>       certificateClient.getServerKeyStoresFactory(); {code}
> This in turn calls loadKeyManager which tries to load the entire trust chain 
> {code:java}
> private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
>     throws GeneralSecurityException, IOException {
>   PrivateKey privateKey = caClient.getPrivateKey();
>   List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
> Loading the entire trust chain does a listCA call which is network call to 
> SCMSecurityProtocolServer
> {code:java}
> public List<String> updateCAList() throws IOException {
>   pemEncodedCACertsLock.lock();
>   try {
>     pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
> All of this happens inside the StorageContainerManager constructor but the 
> services in SCM are started only after constructor is initialised and 
> scm.start() is called which means it is sending a request to security server 
> before it is even started thus leading to connection refused messages in SCM 
> startup like below,
> {code:java}
> 10:45:45.506 AM             INFO      SCMRatisServerImpl starting Raft server 
> for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
> 10:45:47.563 AM             INFO      RetryInvocationHandler 
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 
> after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> 10:45:49.565 AM             INFO      RetryInvocationHandler 
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 
> after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> (repeated) {code}
> StackTrace
> {code:java}
> java.net.ConnectException: Connection refused
>     at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
>     at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
>     at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:730)
>     at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:843)
>     at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:430)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1681)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1506)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1459)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>     at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
>     at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>     at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
>     at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:102)
>     at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:374)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.updateCAList(DefaultCertificateClient.java:933)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.listCA(DefaultCertificateClient.java:921)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getTrustChain(DefaultCertificateClient.java:410)
>     at 
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.loadKeyManager(ReloadingX509KeyManager.java:204)
>     at 
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.<init>(ReloadingX509KeyManager.java:85)
>     at 
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.createKeyManagers(PemFileBasedKeyStoresFactory.java:83)
>     at 
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.init(PemFileBasedKeyStoresFactory.java:104)
>     at 
> org.apache.hadoop.hdds.security.x509.keys.SecurityUtil.getServerKeyStoresFactory(SecurityUtil.java:103)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getServerKeyStoresFactory(DefaultCertificateClient.java:948)
>     at 
> org.apache.hadoop.hdds.scm.ha.HASecurityUtils.createSCMRatisTLSConfig(HASecurityUtils.java:345)
>     at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.<init>(SCMRatisServerImpl.java:109)
>     at 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.<init>(SCMHAManagerImpl.java:97)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:646)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:400)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:597)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:609)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:171)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:145)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:74)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:48)
>     at picocli.CommandLine.executeUserObject(CommandLine.java:1953) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to