[ 
https://issues.apache.org/jira/browse/HDDS-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777678#comment-17777678
 ] 

István Fajth commented on HDDS-9420:
------------------------------------

Yes it is a MAY there in the text of RFC-5246, but considering the fact that it 
causes problems for us, and it is optional, I still think the best way is to 
omit the rootCA from the certificate bundle completely.
I have the following pros:
- no extra handling needed for this case in the SCM certificates, while if we 
leave there the current if condition, SCM certificates are always examined and 
be handled by the else case.
- less network traffic during the SSL handshake, as we do not need to transfer 
the rootCA certificate in the bundle

I don't see any cons, as we did not have certificate chains in 1.3, the interim 
versions have the rootCA in the chain, but the finel 1.4 version should migrate 
both the 1.3 and the interim approach to just do not have the rootCA leaving us 
with one less potential issue to handle somehow, and don't make us to go to the 
SCM for interim certificates. Especially as this issue should only arise, if we 
do not have the rootCA or the interimCA certificates locally already, which we 
should.

I know that it is more effort today, with that a somewhat more complex patch, 
but I am certain if we do it now, we will have a simpler codebase that is 
easier to maintain, and less potential for failure modes, and I firmly believe 
it does worth the effort.

> Enabling GRPC encryption causes SCM startup failure.  
> ------------------------------------------------------
>
>                 Key: HDDS-9420
>                 URL: https://issues.apache.org/jira/browse/HDDS-9420
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Security
>    Affects Versions: 1.4.0
>            Reporter: Sadanand Shenoy
>            Assignee: Sammi Chen
>            Priority: Blocker
>
> HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In 
> SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it 
> tries to load the keyStoresFactory
> {code:java}
> if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
>   KeyStoresFactory serverKeyFactory =
>       certificateClient.getServerKeyStoresFactory(); {code}
> This in turn calls loadKeyManager which tries to load the entire trust chain 
> {code:java}
> private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient)
>     throws GeneralSecurityException, IOException {
>   PrivateKey privateKey = caClient.getPrivateKey();
>   List<X509Certificate> newCertList = caClient.getTrustChain(); {code}
> Loading the entire trust chain does a listCA call which is network call to 
> SCMSecurityProtocolServer
> {code:java}
> public List<String> updateCAList() throws IOException {
>   pemEncodedCACertsLock.lock();
>   try {
>     pemEncodedCACerts = getScmSecureClient().listCACertificate(); {code}
> All of this happens inside the StorageContainerManager constructor but the 
> services in SCM are started only after constructor is initialised and 
> scm.start() is called which means it is sending a request to security server 
> before it is even started thus leading to connection refused messages in SCM 
> startup like below,
> {code:java}
> 10:45:45.506 AM             INFO      SCMRatisServerImpl starting Raft server 
> for scm:7b4b7153-eb02-443b-b8f9-3b146931674c
> 10:45:47.563 AM             INFO      RetryInvocationHandler 
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 
> after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> 10:45:49.565 AM             INFO      RetryInvocationHandler 
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From 
> <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 
> after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> (repeated) {code}
> StackTrace
> {code:java}
> java.net.ConnectException: Connection refused
>     at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
>     at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
>     at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:730)
>     at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:843)
>     at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:430)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1681)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1506)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1459)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>     at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
>     at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>     at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
>     at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:102)
>     at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:374)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.updateCAList(DefaultCertificateClient.java:933)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.listCA(DefaultCertificateClient.java:921)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getTrustChain(DefaultCertificateClient.java:410)
>     at 
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.loadKeyManager(ReloadingX509KeyManager.java:204)
>     at 
> org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.<init>(ReloadingX509KeyManager.java:85)
>     at 
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.createKeyManagers(PemFileBasedKeyStoresFactory.java:83)
>     at 
> org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.init(PemFileBasedKeyStoresFactory.java:104)
>     at 
> org.apache.hadoop.hdds.security.x509.keys.SecurityUtil.getServerKeyStoresFactory(SecurityUtil.java:103)
>     at 
> org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getServerKeyStoresFactory(DefaultCertificateClient.java:948)
>     at 
> org.apache.hadoop.hdds.scm.ha.HASecurityUtils.createSCMRatisTLSConfig(HASecurityUtils.java:345)
>     at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.<init>(SCMRatisServerImpl.java:109)
>     at 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.<init>(SCMHAManagerImpl.java:97)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:646)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:400)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:597)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:609)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:171)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:145)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:74)
>     at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:48)
>     at picocli.CommandLine.executeUserObject(CommandLine.java:1953) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to