Suhas Dantkale created ZOOKEEPER-3723:
-----------------------------------------

             Summary: Zookeeper Client should not fail with ZSYSTEMERROR if DNS 
does not resolve one of the servers in the zk ensemble. 
                 Key: ZOOKEEPER-3723
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3723
             Project: ZooKeeper
          Issue Type: Improvement
          Components: c client, java client
            Reporter: Suhas Dantkale


This is a minor enhancement request to not fail the session initiation if the 
DNS is not able to resolve the hostname of one of the servers in the Zookeeper 
ensemble.

 

The Zookeeper client resolves all the hostnames in the ensemble while 
establishing the session.

In Kubernetes environment with coreDNS, the hostname entry gets removed from 
coreDNS during the POD restarts. Though we can manipulate the coreDNS settings 
to delay the removal of the hostname entry from DNS, we don't want to leave any 
race where Zookeeper clinet is trying to establish a session and it fails 
because the DNS temporarily is not able to resolve the hostname. So as long as 
one of the servers in the ensemble is able to be DNS resolvable, should we not 
fail the session establishment with hard error and instead try to establish the 
connection with one of the other servers?

 

Look at the below snippet where  resolve_hosts() fails with ZSYSTEMERROR.
{code:java}
if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
            //bug in getaddrinfo implementation when it returns
            //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
            // ai_flags as AI_ADDRCONFIG
#ifdef AI_ADDRCONFIG
            if ((hints.ai_flags == AI_ADDRCONFIG) &&
// ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
#ifdef EAI_ADDRFAMILY
                ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
#else
                (rc == EAI_BADFLAGS)) {
#endif
                //reset ai_flags to null
                hints.ai_flags = 0;
                //retry getaddrinfo
                rc = getaddrinfo(host, port_spec, &hints, &res0);
            }
#endif
            if (rc != 0) {
                errno = getaddrinfo_errno(rc);
#ifdef _WIN32
                LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", 
gai_strerror(rc));
#elif __linux__ && __GNUC__
                LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", 
gai_strerror(rc));
#else
                LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", 
strerror(errno));
#endif
                rc=ZSYSTEMERROR;
                goto fail;
            }
        }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to