Sounds like a plan, but I would do the 2 things in separate patches: the current patch covers the toggle for client hostname verification and I think it’s already ready for submit.
Another patch should cover the alternate way of hostname verification based on what we discussed. Could you please update the Jira ticket(s) accordingly to make sure we’re on the same page? Thanks, Andor > On Nov 19, 2024, at 15:19, Sönke Liebau > <soenke.lie...@stackable.tech.INVALID> wrote: > > It would be nice, being able to find out which peer is trying to connect, > but as you say, we won't have the id available before the handshake is done > (unless we have users stick it in the cert, and I don't think we want that > :) ) > Anything we could do to find out would probably need to involve DNS and > some dirty compromise.. so probably best to just not go there. > > Your idea for backwards compatible config sounds sensible to me. > It would probably make sense to get this done as part of > https://github.com/apache/zookeeper/pull/2173 and discuss possible config > changes up front for everything? > > Best, Sönke > > > On Tue, 19 Nov 2024, 16:46 Andor Molnar, <an...@kazinczy.hu> wrote: > >> Correction, sorry: >> >>> Not necessarily one SAN matches, but _the one_ matches which corresponds >> to the peer. >> >> I meant not any of the listed server names matches in zoo.cfg, but we >> might be able to identify which peer is trying to connect. This is probably >> not feasible, because at the time of TLS handshake the peer id is not yet >> available. >> >> We should do this in backward compatible way: client hostname >> verification, by default, works at is, but we introduce an option to >> disable reverse DNS lookups. That switches the behavior to check against >> list of hostnames in zoo.cfg. In addition we enable client hostname >> verification to be completely disabled too. >> >> Wdyt? >> >> Andor >> >> >> >> >>> On Nov 19, 2024, at 09:38, Andor Molnar <an...@kazinczy.hu> wrote: >>> >>> Hi Sönke, >>> >>> Are you still working on the patch btw? >>> https://github.com/apache/zookeeper/pull/2173 >>> >>> (Though I’m not sure if it was you who opened it) >>> >>> Thanks for the pointer, ZooKeeper support for K8s is unfortunately still >> far from perfect. Introducing the FIPS mode option might not have been the >> best choice, but we first faced the issue during FIPS deployment where you >> cannot use custom hostname verificator, so we had to disable it completely. >>> >>> I’d like to idea of >>> >>> "The ZK server could verify the SAN against the list of servers >> (servers.N in the config). A peer should be able to connect on the quorum >> port if and only if at least one SAN matches at least one of the listed >> servers.” >>> >>> Not necessarily one SAN matches, but _the one_ matches which corresponds >> to the peer. Does that make sense? Not sure if technically feasible, but >> are you willing to create a patch? >>> >>> Regards, >>> Andor >>> >>> >>> >>> >>>> On Nov 19, 2024, at 03:11, Sönke Liebau >> <soenke.lie...@stackable.tech.INVALID> wrote: >>>> >>>> You are probably running into ZOOKEEPER-4790 [1] here. >>>> >>>> When we encountered this back in the day [2] we figured out that >> enabling >>>> FIPS mode bypasses all the ZK specific TLS checks and makes it work. In >> the >>>> ZK version you are on it is not yet enabled by default, you could either >>>> update or set *zookeeper.fips-mode *and this error _should_ go away. >>>> >>>> Good luck :) >>>> >>>> Best, >>>> Sönke >>>> >>>> [1] https://issues.apache.org/jira/browse/ZOOKEEPER-4790 >>>> [2] https://github.com/stackabletech/zookeeper-operator/issues/760 >>>> >>>> On Tue, Nov 19, 2024 at 9:08 AM Dharani (Jira) <j...@apache.org> wrote: >>>> >>>>> Dharani created ZOOKEEPER-4887: >>>>> ---------------------------------- >>>>> >>>>> Summary: Zookeeper quorum formation fails when TLS is >> enabled >>>>> in k8s env >>>>> Key: ZOOKEEPER-4887 >>>>> URL: >> https://issues.apache.org/jira/browse/ZOOKEEPER-4887 >>>>> Project: ZooKeeper >>>>> Issue Type: Bug >>>>> Affects Versions: 3.8.3 >>>>> Reporter: Dharani >>>>> >>>>> >>>>> We have three(3) node zookeeper cluster running as a pod on Kubernetes >>>>> cluster, zookeeper quorum formation fails with TLS handshake error, as >> the >>>>> server name in the https request does not match with any of the SANs >> in the >>>>> certificate configured for zookeeper server. Server name in the >> request is >>>>> of the form "x-x-x-x.kubernetes.default.svc.cluster.local" (where >> x-x-x-x >>>>> is the IP address of the POD), and I am unable to understand the reason >>>>> behind pre-pending FQDN with a IP address. >>>>> >>>>> >>>>> >>>>> Please find below the extract of the error logs from the zookeeper POD >>>>> {code:java} >>>>> [myid:] - ERROR [LearnerHandler-/192.168.220. >>>>> 10:46516:o.a.z.c.ZKTrustManager@191] - Failed to verify host address: >>>>> 192.168.220.10 >>>>> javax.net.ssl.SSLPeerUnverifiedException: Certificate for >> <192.168.220.10> >>>>> doesn't match any of the subject alternative names: >>>>> [eric-data-coordinator-zk, eric-data-coordinator-zk.zdhagxx1, >>>>> eric-data-coordinator-zk.zdhagxx1.svc, >>>>> eric-data-coordinator-zk.zdhagxx1.svc.cluster.local, >>>>> *.eric-data-coordinator-zk-ensemble-service.zdhagxx1.svc.cluster.local, >>>>> certified-scrape-target] >>>>> >>>>> >> org.apache.zookeeper.common.ZKHostnameVerifier.matchIPAddress(ZKHostnameVerifier.java:197) >>>>> >>>>> >> org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:165) >>>>> >>>>> >> org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:180) >>>>> >>>>> >> org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:93) >>>>> >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkClientCerts(CertificateMessage.java:1285) >>>>> >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1204) >>>>> >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1181) >>>>> java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) >>>>> >>>>> >> java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) >>>>> >>>>> >> java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) >>>>> >>>>> >> java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) >>>>> java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:926) >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:372) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:269) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:298) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:172) >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:699) >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693) >>>>> java.base/java.io >> .BufferedInputStream.fill(BufferedInputStream.java:252) >>>>> java.base/java.io >> .BufferedInputStream.read(BufferedInputStream.java:271) >>>>> java.base/java.io.DataInputStream.readInt(DataInputStream.java:392) >>>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) >>>>> >> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) >>>>> >> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:472)[myid:] >>>>> - ERROR [LearnerHandler-/192.168.220. >> 10:46516:o.a.z.c.ZKTrustManager@192] >>>>> - Failed to verify hostname: >>>>> 192-168-220-10.eric-data-coordinator-zk.zdhagxx1.svc.cluster.local >>>>> javax.net.ssl.SSLPeerUnverifiedException: Certificate for >>>>> <192-168-220-10.eric-data-coordinator-zk.zdhagxx1.svc.cluster.local> >>>>> doesn't match any of the subject alternative names: >>>>> [eric-data-coordinator-zk, eric-data-coordinator-zk.zdhagxx1, >>>>> eric-data-coordinator-zk.zdhagxx1.svc, >>>>> eric-data-coordinator-zk.zdhagxx1.svc.cluster.local, >>>>> *.eric-data-coordinator-zk-ensemble-service.zdhagxx1.svc.cluster.local, >>>>> certified-scrape-target] >>>>> >> org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:230) >>>>> >> org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:171) >>>>> >>>>> >> org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:189) >>>>> >>>>> >> org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:93) >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkClientCerts(CertificateMessage.java:1285) >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1204) >>>>> >> java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1181) >>>>> java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) >>>>> >>>>> >> java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) >>>>> >>>>> >> java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) >>>>> >>>>> >> java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) >>>>> java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) >>>>> >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:926) >>>>> >> java.base/sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:372) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:269) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:298) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:172) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:699) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693) >>>>> java.base/java.io >> .BufferedInputStream.fill(BufferedInputStream.java:252) >>>>> java.base/java.io >> .BufferedInputStream.read(BufferedInputStream.java:271) >>>>> java.base/java.io.DataInputStream.readInt(DataInputStream.java:392) >>>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) >>>>> >>>>> >> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) >>>>> >> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) >>>>> >> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:472) >>>>> {code} >>>>> >>>>> >>>>> >>>>> -- >>>>> This message was sent by Atlassian Jira >>>>> (v8.20.10#820010) >>>>> >>> >> >>