[ https://issues.apache.org/jira/browse/ZOOKEEPER-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901732#comment-17901732 ]
Scott Guminy commented on ZOOKEEPER-4888: ----------------------------------------- [~andor] I work with Aayush and Jeetendra. I investigated this a bit more and have some ideas on the cause of this issue. The IBM JDK uses different cipher names internally compared to other JDK implementations. You can see that information here [https://www.ibm.com/docs/en/sdk-java-technology/8?topic=suites-cipher] Specifically some cipher names start with SSL_ instead of TLS_. If you call SSLServerSocketFactory.getDefault()).getSupportedCipherSuites(), these ciphers will be returned with SSL_. Why is this a problem for ZooKeeper? In X509Util, there are several methods that return lists of ciphers that ZK wants to use (getTLSv13Ciphers, getGCMCiphers, getCBCCiphers). These ciphers start with TLS_. Later in the getSupportedCiphers method, ZK filters out unsupported ciphers by comparing the hardcoded lists to what the JDK's SSLServerSocketFactory.getDefault()).getSupportedCipherSuites() method returns. For the IBM JDK, there's little overlap due to the naming differences, and this results in no TLS 1.2 ciphers in the list. This makes it impossible for any clients to connect with TLS 1.2. The solution is to add the SSL_ ciphers in the getTLSv13Ciphers (maybe not needed), getGCMCiphers, getCBCCiphers methods, to properly support the IBM JDK. > Issues with TLS post upgrade from 3.9.1 to 3.9.2 > ------------------------------------------------ > > Key: ZOOKEEPER-4888 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4888 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.9.2 > Reporter: Jeetendra N > Assignee: Andor Molnar > Priority: Major > Fix For: 3.9.2 > > > We upgraded Zookeeper ensemble from 3.9.1 to 3.9.2. TLS (node-node, > client-node) is enabled before upgrade. Everything was working fine before > upgrade. > Post upgrade -> > # Stopped everything (all ZK nodes) > # Started all ZK nodes > # Checked if SSL is happening between ZK nodes is fine or not > # Its confirmed that SSL is working fine between ZK nodes. > # Now started just one instance of client application > # Post that we see intermittent successful & unsuccessful handshake messages > in ZK logs. > *ZK server side, we see below messages:* > 2024-11-21 13:28:15,586 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting > standard x509 trust manager com.ibm.jsse2.br@4362299c > 2024-11-21 13:28:15,586 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.c.X509Util@644] - Using Java8 optimized cipher > suites for Java version 1.8 > 2024-11-21 13:28:15,588 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler > added for channel: [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] > 2024-11-21 13:28:15,620 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@415] > - *Successful handshake with session 0x0* > 2024-11-21 13:28:15,620 [myid:] - DEBUG > [epollEventLoopGroup-4-9:i.n.h.s.SslHandler@1934] - [id: 0x2443db1c, > L:/10.1.10.50:2181 - R:/10.1.10.46:57272] HANDSHAKEN: protocol:TLSv1.3 cipher > suite:TLS_AES_256_GCM_SHA384 > 2024-11-21 13:28:15,622 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@350] > - New message PooledUnsafeDirectByteBuf(ridx: 0, widx: 4, cap: 42) from [id: > 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] > 2024-11-21 13:28:15,622 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@368] - 0x0 queuedBuffer: null > 2024-11-21 13:28:15,622 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@386] - not throttled > 2024-11-21 13:28:15,623 [myid:] - INFO > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@311] - Processing mntr > command from /10.1.10.46:57272 > 2024-11-21 13:28:15,642 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for > session id: 0x0 > 2024-11-21 13:28:15,642 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@131] - close in progress for > session id: 0x0 > 2024-11-21 13:28:15,644 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for > session id: 0x0 > 2024-11-21 13:28:15,644 [myid:] - DEBUG > [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 > 2024-11-21 13:28:17,155 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting > standard x509 trust manager com.ibm.jsse2.br@a5cca67c > 2024-11-21 13:28:17,156 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.c.X509Util@644] - Using Java8 optimized > cipher suites for Java version 1.8 > 2024-11-21 13:28:17,158 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler > added for channel: [id: 0xb818882d, L:/10.1.10.50:2181 - R:/10.1.10.46:57276] > 2024-11-21 13:28:17,161 [myid:] - ERROR > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@466] > - *Unsuccessful handshake with session 0x0* > 2024-11-21 13:28:17,161 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for > session id: 0x0 > 2024-11-21 13:28:17,162 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 > 2024-11-21 13:28:17,163 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for > session id: 0x0 > 2024-11-21 13:28:17,163 [myid:] - DEBUG > [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0 > > *At client side, we see below message intermittently.* > 17:37:43.878 [pool-7-thread-1-SendThread(10.1.10.50:2181)] WARN > org.apache.zookeeper.ClientCnxn - Session 0x0 for server > bdc-dev1807.in.syncsort.dev/10.1.10.50:2181, Closing socket connection. > Attempting reconnect except it is a SessionExpiredException. > org.apache.zookeeper.ClientCnxn$EndOfStreamException: channel for sessionid > 0x0 is lost > at > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:287) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1274) > *We also see successful SSL connections from client side as well* > INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 > Nov 21, 2024 5:46:04 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect > INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 > Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect > INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 > Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect > INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 > Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect > INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181 > Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect > INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181 > *We have not set any TLS protocol version or Ciphers at client or server > side.* > *We are using IBM JDK 8.* > Please help troubleshoot this issue -- This message was sent by Atlassian Jira (v8.20.10#820010)