[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko resolved ZOOKEEPER-3723.
-----------------------------------------
    Resolution: Fixed

> Zookeeper Client should not fail with ZSYSTEMERROR if DNS does not resolve 
> one of the servers in the zk ensemble. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3723
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3723
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, java client
>    Affects Versions: 3.5.5
>            Reporter: Suhas Dantkale
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.5.8
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a minor enhancement request to not fail the session initiation if the 
> DNS is not able to resolve the hostname of one of the servers in the 
> Zookeeper ensemble.
>  
> The Zookeeper client resolves all the hostnames in the ensemble while 
> establishing the session.
> In Kubernetes environment with coreDNS, the hostname entry gets removed from 
> coreDNS during the POD restarts. Though we can manipulate the coreDNS 
> settings to delay the removal of the hostname entry from DNS, we don't want 
> to leave any race where Zookeeper clinet is trying to establish a session and 
> it fails because the DNS temporarily is not able to resolve the hostname. So 
> as long as one of the servers in the ensemble is able to be DNS resolvable, 
> should we not fail the session establishment with hard error and instead try 
> to establish the connection with one of the other servers?
>  
> Look at the below snippet where  resolve_hosts() fails with ZSYSTEMERROR.
> {code:java}
> if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
>             //bug in getaddrinfo implementation when it returns
>             //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
>             // ai_flags as AI_ADDRCONFIG
> #ifdef AI_ADDRCONFIG
>             if ((hints.ai_flags == AI_ADDRCONFIG) &&
> // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
> #ifdef EAI_ADDRFAMILY
>                 ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
> #else
>                 (rc == EAI_BADFLAGS)) {
> #endif
>                 //reset ai_flags to null
>                 hints.ai_flags = 0;
>                 //retry getaddrinfo
>                 rc = getaddrinfo(host, port_spec, &hints, &res0);
>             }
> #endif
>             if (rc != 0) {
>                 errno = getaddrinfo_errno(rc);
> #ifdef _WIN32
>                 LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", 
> gai_strerror(rc));
> #elif __linux__ && __GNUC__
>                 LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", 
> gai_strerror(rc));
> #else
>                 LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", 
> strerror(errno));
> #endif
>                 rc=ZSYSTEMERROR;
>                 goto fail;
>             }
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to