Thanks for your response Enrico!

On Thu, Feb 1, 2024 at 10:25 AM Enrico Olivelli <eolive...@gmail.com> wrote:

> Sönke,
>
> Il giorno gio 1 feb 2024 alle ore 09:26 Sönke Liebau
> <soenke.lie...@stackable.tech.invalid> ha scritto:
> >
> > Hi all,
> >
> > we recently ran into issues with ZooKeeper on Kubernetes which caused us
> to
> > open [1] after a bit of analysis.
> >
> > We are happy to work on opening a PR to improve this behavior here, but I
> > wanted to start a discussion around what "improve" would look like
> exactly
> > before putting any effort into the PR.
> >
> > I'll keep this mail light on details  - it is hopefully all covered in
> the
> > issue.
>
> It is good to also write some details here, sometimes people are lazy
> to open JIRA,
> and also it will be easier to add inline questions/answers
>
> Fair point :)

The issue is, that ZooKeeper performs a reverse dns lookup on ips for
incoming quorum connections and then compares the hostname from that
reverse lookup with the SAN field in the certificate used for that
connection.
For this lookup it uses the Java function getHostName[1] which can only
return a string. When running in Kubernetes there are usually more than one
hostnames that any given ip address can resolve to, and it is not
deterministic which one is returned by this call.
There are issues in coredns around this as well [2][3][4] - but basically
what it boils down to is: you get a random hostname out of the list of
valid ones back, and it will change permanently which one that is.

What this effectively meant for us was that on ZooKeeper restarts some pods
would randomly be unable to connect for a random amount of time because the
running servers refused connections until dns decided to return a different
hostname and all was well again.


> >
> > My basic question is: would people be okay with adding a check of the
> > certificate SAN entries against the hostnames from config?
>
> I think that this is good, especially if that can help people
> deploying ZK in k8s with
> security enabled. We should remove all the pain points for users.
>
> >
> > We cannot simply replace the existing check [2] of course, that'd run a
> > high risk of breaking existing setups, obvious options there would be to
> > either add a config option to replace the hostname check with this check,
> > or run this check in parallel with the hostname check and if either of
> them
> > succeeds allow the connection, but I'm sure there are many other
> potential
> > ways of doing this.
>
> Yes, we must add some flag, then we introduce it in the next major version,
> maybe it will become the new default behaviour at some point.
>

That sounds good to me!



>
> Thanks
> Enrico
>
> >
> > Any thoughts or opinions on this would be very appreciated.
> >
> > Best regards,
> > Sönke
> >
> >
> > [1] https://issues.apache.org/jira/browse/ZOOKEEPER-4790
> > [2]
> >
> https://github.com/apache/zookeeper/blob/11c07921c15e2fb7692375327b53f26a583b77ca/zookeeper-server/src/main/java/org/apache/zookeeper/common/ZKTrustManager.java#L158



[1]
https://docs.oracle.com/javase/8/docs/api/java/net/InetAddress.html#getHostName--

[2] https://github.com/coredns/coredns/issues/3686
[3] https://github.com/coredns/coredns/pull/3687
[4] https://github.com/coredns/coredns/issues/4181

Reply via email to