ASF GitHub Bot commented on ZOOKEEPER-2982:
GitHub user EronWright opened a pull request:
ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution if node connection
This PR ports a fix from the 3.4 to the 3.5 branch, that was originally
made in ZOOKEEPER-1506.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/EronWright/zookeeper ZOOKEEPER-2982
Alternatively you can review and apply these changes as the patch at:
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #468
Author: Eron Wright <eron.wright@...>
ZOOKEEPER-2982 Re-try DNS hostname -> IP resolution if node connection fails
> Re-try DNS hostname -> IP resolution
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.0, 3.5.1, 3.5.3
> Reporter: Eron Wright
> Priority: Blocker
> Fix For: 3.5.4
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4. Some portions of the fix
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started
> before all peer addresses are resolvable, that server may cache a negative
> lookup result and forever fail to resolve the address. For example,
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless)
> may fail because the DNS records are created lazily.
> 2018-02-18 09:11:22,583 [myid:0] - WARN
> - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not
> resolvable when the server started, but became resolvable shortly thereafter.
> The server should eventually succeed but doesn't.
This message was sent by Atlassian JIRA