[
https://issues.apache.org/jira/browse/MESOS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522259#comment-14522259
]
Raul Gutierrez Segales commented on MESOS-2681:
-----------------------------------------------
Yeah, if you are getting a new zk handle after 10s via a zookeeper_init() call,
that would trigger a DNS lookup.
I think we saw this in prod, but it might have been due to some dns servers not
being up to date.
> Slave process must restart to update ensemble members
> -----------------------------------------------------
>
> Key: MESOS-2681
> URL: https://issues.apache.org/jira/browse/MESOS-2681
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Reporter: Joe Smith
>
> Right now, if a ZooKeeper ensemble has (for instance) more observers added to
> it, the Mesos Slaves will not see them, and continue to attempt to connect to
> only the original members. A restart of the slave process is required to call
> {{getaddrinfo}} again and enumerate the list of hosts in the ensemble.
> Subsequent {{getaddrinfo}} calls _will only_ occur when {{zookeeper_init()}}
> is called again, that is to say: when the old session expires and you need to
> create a new one. If you swap all hosts in your ensemble too fast, without
> permitting time for old sessions to expire, you'd end up with clients looping
> forever, trying to connect to the old servers in order to get its old session
> expired.
> This is best tracked by ZOOKEEPER-1998, where these is some discussion about
> a necessary improvement to the implementation already in the 3.5.x branch, or
> putting this functionality (debatably a feature vs. fixing a bug) in 3.4.x.
> (Thanks to [~rgs] for reviewing this as well)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)