I attempted to run ZK 3.5.3-beta in a Kubernetes cluster, using the typical
approach of a StatefulSet plus a pair of Services. I observed that some
of my ZK servers would fail to resolve the DNS addresses of its peers
indefinitely. It is normal that addresses cannot be resolved immediately
at startup because the records are created asynchronously by Kubernetes.
One would expect ZK to keep trying and eventually succeed. Note that
this issue affects 3.5 only; 3.4 seems to work fine.
I tracked the root cause down to a regression in 3.5. ZOOKEEPER-1506 made
an improvement 3.4 that wasn't ported to 3.5. I opened ZOOKEEPER-2982 to
track this, and have a PR ready. Could we shoot to get the fix into 3.5.4?