Happy to split this into two patches, my main worry was that we notice in the second patch, that we are not compatible with the first one from a config perspective any more.
The proposed `clientHostnameVerification` parameters take a boolean value I think. One could argue that this parameter would also be appropriate for the new "hostname from list of peers" logic, as that is also a hostnameVerification, so instead of it being a bool it could become a list of allowed values [ "none", "peer", "full"]. With 'peer' being the new and improved thing, 'none' disabling all checks and 'full' being the current behavior. Full is probably not the correct term for this, maybe 'legacy' or 'dns' or something along these lines could be appropriate. It might be worth deciding on this up front, to make sure we don't have to backtrack at some point. Best regards, Sönke On Wed, Nov 20, 2024 at 2:31 AM Andor Molnar <an...@apache.org> wrote: > Sounds like a plan, but I would do the 2 things in separate patches: the > current patch covers the toggle for client hostname verification and I > think it’s already ready for submit. > > Another patch should cover the alternate way of hostname verification > based on what we discussed. Could you please update the Jira ticket(s) > accordingly to make sure we’re on the same page? > > Thanks, > Andor > > > > > On Nov 19, 2024, at 15:19, Sönke Liebau > <soenke.lie...@stackable.tech.INVALID> wrote: > > > > It would be nice, being able to find out which peer is trying to connect, > > but as you say, we won't have the id available before the handshake is > done > > (unless we have users stick it in the cert, and I don't think we want > that > > :) ) > > Anything we could do to find out would probably need to involve DNS and > > some dirty compromise.. so probably best to just not go there. > > > > Your idea for backwards compatible config sounds sensible to me. > > It would probably make sense to get this done as part of > > https://github.com/apache/zookeeper/pull/2173 and discuss possible > config > > changes up front for everything? > > > > Best, Sönke > > > > > > On Tue, 19 Nov 2024, 16:46 Andor Molnar, <an...@kazinczy.hu> wrote: > > > >> Correction, sorry: > >> > >>> Not necessarily one SAN matches, but _the one_ matches which > corresponds > >> to the peer. > >> > >> I meant not any of the listed server names matches in zoo.cfg, but we > >> might be able to identify which peer is trying to connect. This is > probably > >> not feasible, because at the time of TLS handshake the peer id is not > yet > >> available. > >> > >> We should do this in backward compatible way: client hostname > >> verification, by default, works at is, but we introduce an option to > >> disable reverse DNS lookups. That switches the behavior to check against > >> list of hostnames in zoo.cfg. In addition we enable client hostname > >> verification to be completely disabled too. > >> > >> Wdyt? > >> > >> Andor > >> > >> > >> > >> > >>> On Nov 19, 2024, at 09:38, Andor Molnar <an...@kazinczy.hu> wrote: > >>> > >>> Hi Sönke, > >>> > >>> Are you still working on the patch btw? > >>> https://github.com/apache/zookeeper/pull/2173 > >>> > >>> (Though I’m not sure if it was you who opened it) > >>> > >>> Thanks for the pointer, ZooKeeper support for K8s is unfortunately > still > >> far from perfect. Introducing the FIPS mode option might not have been > the > >> best choice, but we first faced the issue during FIPS deployment where > you > >> cannot use custom hostname verificator, so we had to disable it > completely. > >>> > >>> I’d like to idea of > >>> > >>> "The ZK server could verify the SAN against the list of servers > >> (servers.N in the config). A peer should be able to connect on the > quorum > >> port if and only if at least one SAN matches at least one of the listed > >> servers.” > >>> > >>> Not necessarily one SAN matches, but _the one_ matches which > corresponds > >> to the peer. Does that make sense? Not sure if technically feasible, but > >> are you willing to create a patch? > >>> > >>> Regards, > >>> Andor > >>> > >>> > >>> > >>> > >>>> On Nov 19, 2024, at 03:11, Sönke Liebau > >> <soenke.lie...@stackable.tech.INVALID> wrote: > >>>> > >>>> You are probably running into ZOOKEEPER-4790 [1] here. > >>>> > >>>> When we encountered this back in the day [2] we figured out that > >> enabling > >>>> FIPS mode bypasses all the ZK specific TLS checks and makes it work. > In > >> the > >>>> ZK version you are on it is not yet enabled by default, you could > either > >>>> update or set *zookeeper.fips-mode *and this error _should_ go away. > >>>> > >>>> Good luck :) > >>>> > >>>> Best, > >>>> Sönke > >>>> > >>>> [1] https://issues.apache.org/jira/browse/ZOOKEEPER-4790 > >>>> [2] https://github.com/stackabletech/zookeeper-operator/issues/760 > >>>> > >>>> On Tue, Nov 19, 2024 at 9:08 AM Dharani (Jira) <j...@apache.org> > wrote: > >>>> > >>>>> Dharani created ZOOKEEPER-4887: > >>>>> ---------------------------------- > >>>>> > >>>>> Summary: Zookeeper quorum formation fails when TLS is > >> enabled > >>>>> in k8s env > >>>>> Key: ZOOKEEPER-4887 > >>>>> URL: > >> https://issues.apache.org/jira/browse/ZOOKEEPER-4887 > >>>>> Project: ZooKeeper > >>>>> Issue Type: Bug > >>>>> Affects Versions: 3.8.3 > >>>>> Reporter: Dharani > >>>>> > >>>>> > >>>>> We have three(3) node zookeeper cluster running as a pod on > Kubernetes > >>>>> cluster, zookeeper quorum formation fails with TLS handshake error, > as > >> the > >>>>> server name in the https request does not match with any of the SANs > >> in the > >>>>> certificate configured for zookeeper server. Server name in the > >> request is > >>>>> of the form "x-x-x-x.kubernetes.default.svc.cluster.local" (where > >> x-x-x-x > >>>>> is the IP address of the POD), and I am unable to understand the > reason > >>>>> behind pre-pending FQDN with a IP address. > >>>>> > >>>>> > >>>>> > >>>>> Please find below the extract of the error logs from the zookeeper > POD > >>>>> {code:java} > >>>>> [myid:] - ERROR [LearnerHandler-/192.168.220. > >>>>> 10:46516:o.a.z.c.ZKTrustManager@191] - Failed to verify host > address: > >>>>> 192.168.220.10 > >>>>> javax.net.ssl.SSLPeerUnverifiedException: Certificate for > >> <192.168.220.10> > >>>>> doesn't match any of the subject alternative names: > >>>>> [eric-data-coordinator-zk, eric-data-coordinator-zk.zdhagxx1, > >>>>> eric-data-coordinator-zk.zdhagxx1.svc, > >>>>> eric-data-coordinator-zk.zdhagxx1.svc.cluster.local, > >>>>> > *.eric-data-coordinator-zk-ensemble-service.zdhagxx1.svc.cluster.local, > >>>>> certified-scrape-target] > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKHostnameVerifier.matchIPAddress(ZKHostnameVerifier.java:197) > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:165) > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:180) > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:93) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkClientCerts(CertificateMessage.java:1285) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1204) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1181) > >>>>> > java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) > >>>>> java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) > >>>>> > >> java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:926) > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:372) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:269) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:298) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:172) > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:699) > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693) > >>>>> java.base/java.io > >> .BufferedInputStream.fill(BufferedInputStream.java:252) > >>>>> java.base/java.io > >> .BufferedInputStream.read(BufferedInputStream.java:271) > >>>>> java.base/java.io.DataInputStream.readInt(DataInputStream.java:392) > >>>>> > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) > >>>>> > >> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) > >>>>> > >> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:472)[myid:] > >>>>> - ERROR [LearnerHandler-/192.168.220. > >> 10:46516:o.a.z.c.ZKTrustManager@192] > >>>>> - Failed to verify hostname: > >>>>> 192-168-220-10.eric-data-coordinator-zk.zdhagxx1.svc.cluster.local > >>>>> javax.net.ssl.SSLPeerUnverifiedException: Certificate for > >>>>> <192-168-220-10.eric-data-coordinator-zk.zdhagxx1.svc.cluster.local> > >>>>> doesn't match any of the subject alternative names: > >>>>> [eric-data-coordinator-zk, eric-data-coordinator-zk.zdhagxx1, > >>>>> eric-data-coordinator-zk.zdhagxx1.svc, > >>>>> eric-data-coordinator-zk.zdhagxx1.svc.cluster.local, > >>>>> > *.eric-data-coordinator-zk-ensemble-service.zdhagxx1.svc.cluster.local, > >>>>> certified-scrape-target] > >>>>> > >> > org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:230) > >>>>> > >> > org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:171) > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:189) > >>>>> > >>>>> > >> > org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:93) > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkClientCerts(CertificateMessage.java:1285) > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1204) > >>>>> > >> > java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1181) > >>>>> > java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) > >>>>> java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) > >>>>> > >> java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) > >>>>> > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:926) > >>>>> > >> > java.base/sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:372) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:269) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:298) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:172) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:699) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:693) > >>>>> java.base/java.io > >> .BufferedInputStream.fill(BufferedInputStream.java:252) > >>>>> java.base/java.io > >> .BufferedInputStream.read(BufferedInputStream.java:271) > >>>>> java.base/java.io.DataInputStream.readInt(DataInputStream.java:392) > >>>>> > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) > >>>>> > >>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86) > >>>>> > >> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134) > >>>>> > >> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:472) > >>>>> {code} > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> This message was sent by Atlassian Jira > >>>>> (v8.20.10#820010) > >>>>> > >>> > >> > >> > >