Suhas Dantkale created ZOOKEEPER-3723:
-----------------------------------------
Summary: Zookeeper Client should not fail with ZSYSTEMERROR if DNS
does not resolve one of the servers in the zk ensemble.
Key: ZOOKEEPER-3723
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3723
Project: ZooKeeper
Issue Type: Improvement
Components: c client, java client
Reporter: Suhas Dantkale
This is a minor enhancement request to not fail the session initiation if the
DNS is not able to resolve the hostname of one of the servers in the Zookeeper
ensemble.
The Zookeeper client resolves all the hostnames in the ensemble while
establishing the session.
In Kubernetes environment with coreDNS, the hostname entry gets removed from
coreDNS during the POD restarts. Though we can manipulate the coreDNS settings
to delay the removal of the hostname entry from DNS, we don't want to leave any
race where Zookeeper clinet is trying to establish a session and it fails
because the DNS temporarily is not able to resolve the hostname. So as long as
one of the servers in the ensemble is able to be DNS resolvable, should we not
fail the session establishment with hard error and instead try to establish the
connection with one of the other servers?
Look at the below snippet where resolve_hosts() fails with ZSYSTEMERROR.
{code:java}
if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
//bug in getaddrinfo implementation when it returns
//EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
// ai_flags as AI_ADDRCONFIG
#ifdef AI_ADDRCONFIG
if ((hints.ai_flags == AI_ADDRCONFIG) &&
// ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
#ifdef EAI_ADDRFAMILY
((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
#else
(rc == EAI_BADFLAGS)) {
#endif
//reset ai_flags to null
hints.ai_flags = 0;
//retry getaddrinfo
rc = getaddrinfo(host, port_spec, &hints, &res0);
}
#endif
if (rc != 0) {
errno = getaddrinfo_errno(rc);
#ifdef _WIN32
LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n",
gai_strerror(rc));
#elif __linux__ && __GNUC__
LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n",
gai_strerror(rc));
#else
LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n",
strerror(errno));
#endif
rc=ZSYSTEMERROR;
goto fail;
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)