ASF GitHub Bot commented on ZOOKEEPER-2982:

GitHub user EronWright opened a pull request:


    ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution if node connection 

    This PR ports a fix from the 3.4 to the 3.5 branch, that was originally 
made in ZOOKEEPER-1506.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/EronWright/zookeeper ZOOKEEPER-2982

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #468
commit 4f8f3ce8074b878f2a6e32c15ec177f4dcd050e0
Author: Eron Wright <eron.wright@...>
Date:   2018-02-19T23:05:44Z

    ZOOKEEPER-2982 Re-try DNS hostname -> IP resolution if node connection fails


> Re-try DNS hostname -> IP resolution
> ------------------------------------
>                 Key: ZOOKEEPER-2982
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.0, 3.5.1, 3.5.3
>            Reporter: Eron Wright 
>            Priority: Blocker
>             Fix For: 3.5.4
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.    For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
>         at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
>         at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>    The server should eventually succeed but doesn't.

This message was sent by Atlassian JIRA

Reply via email to