[ 
https://issues.apache.org/jira/browse/MESOS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3790:
-------------------------------
    Assignee:     (was: Neil Conway)

> Zk connection should retry on EAI_NONAME
> ----------------------------------------
>
>                 Key: MESOS-3790
>                 URL: https://issues.apache.org/jira/browse/MESOS-3790
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>            Priority: Minor
>              Labels: mesosphere, zookeeper
>
> The zookeeper interface is designed to retry (once per second for up to ten 
> minutes) if one or more of the Zookeeper hostnames can't be resolved (see 
> [MESOS-1326] and [MESOS-1523]).
> However, the current implementation assumes that a DNS resolution failure is 
> indicated by zookeeper_init() returning NULL and errno being set to EINVAL 
> (Zk translates getaddrinfo() failures into errno values). However, the 
> current Zk code does:
> {code}
> static int getaddrinfo_errno(int rc) {
>     switch(rc) {
>     case EAI_NONAME:
> // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
> #if defined EAI_NODATA && EAI_NODATA != EAI_NONAME
>     case EAI_NODATA:
> #endif
>         return ENOENT;
>     case EAI_MEMORY:
>         return ENOMEM;
>     default:
>         return EINVAL;
>     }
> }
> {code}
> getaddrinfo() returns EAI_NONAME when "the node or service is not known"; per 
> discussion in [MESOS-2186], this seems to happen intermittently due to DNS 
> failures.
> Proposed fix: looking at errno is always going to be somewhat fragile, but if 
> we're going to continue doing that, we should check for ENOENT as well as 
> EINVAL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to