[
https://issues.apache.org/jira/browse/HDDS-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776727#comment-17776727
]
István Fajth commented on HDDS-9420:
------------------------------------
The evaluation is right, I have posted HDDS-9410 for exactly the same flow.
I would propose to close HDDS-9410 as duplicate as here we have way more
detailed context, than I did in the other one.
As a solution, we should change the certificate merge logic in a way that it
does not want to include the rootCA certificate in the certificate chain.
The reasoning for this is based on the RFC-5246 (TLS v1.2) the certificate
bundle is a list of certificates that is defined as:
{quote}
certificate_list
This is a sequence (chain) of certificates. The sender's
certificate MUST come first in the list. Each following
certificate MUST directly certify the one preceding it. Because
certificate validation requires that root keys be distributed
independently, the self-signed certificate that specifies the root
certificate authority MAY be omitted from the chain, under the
assumption that the remote end must already possess it in order to
validate it in any case.
{quote}
In our system that assumption about the rootCA should be valid, and every
component and the client should have the rootCA in their respective
TrustStores, so we do not need that at the end of the list in the PEM files.
Also openssl verification would throw a warn on such certificate chains, as
there is a self-signed certificate in the chain, that is not signed by any
Certificate Authority that is in the default certificate authority bundle.
TLS 1.3 is defining the list without the mention of self signed certificates,
but it eases the ordering requirement of 1.2, and does not introduce other
changes, so this kind of certificate bundles work with TLS1.2, and 1.3.
> Enabling GRPC encryption causes SCM startup failure.
> ------------------------------------------------------
>
> Key: HDDS-9420
> URL: https://issues.apache.org/jira/browse/HDDS-9420
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sadanand Shenoy
> Assignee: Sammi Chen
> Priority: Critical
>
> HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In
> SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it
> tries to load the keyStoresFactory
> {code:java}
> if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
> KeyStoresFactory serverKeyFactory =
> certificateClient.getServerKeyStoresFactory(); {code}
> This in turn calls loadKeyManager which tries to load the entire trust chain
> {code:java}
> private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
> throws GeneralSecurityException, IOException {
> PrivateKey privateKey = caClient.getPrivateKey();
> List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
> Loading the entire trust chain does a listCA call which is network call to
> SCMSecurityProtocolServer
> {code:java}
> public List<String> updateCAList() throws IOException {
> pemEncodedCACertsLock.lock();
> try {
> pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
> All of this happens inside the StorageContainerManager constructor but the
> services in SCM are started only after constructor is initialised and
> scm.start() is called which means it is sending a request to security server
> before it is even started thus leading to connection refused messages in SCM
> startup like below,
> {code:java}
> 10:45:45.506 AM INFO SCMRatisServerImpl starting Raft server
> for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
> 10:45:47.563 AM INFO RetryInvocationHandler
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961
> after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> 10:45:49.565 AM INFO RetryInvocationHandler
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961
> after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> (repeated) {code}
> StackTrace
> {code:java}
> java.net.ConnectException: Connection refused
> at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:730)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:843)
> at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:430)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1681)
> at org.apache.hadoop.ipc.Client.call(Client.java:1506)
> at org.apache.hadoop.ipc.Client.call(Client.java:1459)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
> at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
> at
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:102)
> at
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:374)
> at
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.updateCAList(DefaultCertificateClient.java:933)
> at
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.listCA(DefaultCertificateClient.java:921)
> at
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getTrustChain(DefaultCertificateClient.java:410)
> at
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.loadKeyManager(ReloadingX509KeyManager.java:204)
> at
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.<init>(ReloadingX509KeyManager.java:85)
> at
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.createKeyManagers(PemFileBasedKeyStoresFactory.java:83)
> at
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.init(PemFileBasedKeyStoresFactory.java:104)
> at
> org.apache.hadoop.hdds.security.x509.keys.SecurityUtil.getServerKeyStoresFactory(SecurityUtil.java:103)
> at
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getServerKeyStoresFactory(DefaultCertificateClient.java:948)
> at
> org.apache.hadoop.hdds.scm.ha.HASecurityUtils.createSCMRatisTLSConfig(HASecurityUtils.java:345)
> at
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.<init>(SCMRatisServerImpl.java:109)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.<init>(SCMHAManagerImpl.java:97)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:646)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:400)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:597)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:609)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:171)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:145)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:74)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:48)
> at picocli.CommandLine.executeUserObject(CommandLine.java:1953) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]