[
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503564#comment-14503564
]
Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---------------------------------------------------
I am running elections (in a 5 participants + 1 observer cluster) as part of
validating the 3.5.1 alpha rc proposed by [~michim]. I am getting this from
time to time:
https://gist.github.com/rgs1/d11822799fdbbfa5d5f2
I only have IP addresses in zoo.cfg and this patch seems to be triggering a
reverse lookup (IP-> hostname). Given that in my current setup (a test setup,
with systemd-nspawn containers) hostnames don't necessarily resolve back (i.e.:
hostname -> IP doesn't work), participants might end up unable to connect to
the leader if it's initially unavailable.
Is the reverse lookup (IP -> hostname) something expected with this patch or a
side effect? I don't see why we'd ever want/need that reverse lookup given that
it could be problematic in some setups.
Thoughts?
p.s.: will post my entire, reproducible, setup a bit later.
> Re-try DNS hostname -> IP resolution if node connection fails
> -------------------------------------------------------------
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.4.5
> Environment: Ubuntu 11.04 64-bit
> Reporter: Mike Heffner
> Assignee: Michi Mutsuzaki
> Priority: Critical
> Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch,
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch,
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch
>
>
> In our zoo.cfg we use hostnames to identify the ZK servers that are part of
> an ensemble. These hostnames are configured with a low (<= 60s) TTL and the
> IP address they map to can and does change. Our procedure for
> replacing/upgrading a ZK node is to boot an entirely new instance and remap
> the hostname to the new instance's IP address. Our expectation is that when
> the original ZK node is terminated/shutdown, the remaining nodes in the
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt
> to re-resolve the hostname->IP mapping for the new server. Once the original
> ZK node is terminated, the existing servers continue to attempt contacting it
> at the old IP address. It would be great if the ZK servers could try to
> re-resolve the hostname when attempting to connect to a lost ZK server,
> instead of caching the lookup indefinitely. Currently we must do a rolling
> restart of the ZK ensemble after swapping a node -- which at three nodes
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach
> one, of a set of three, Elastic IP address. External to EC2 this IP address
> remains the same and maps to whatever instance it is attached to. Internal to
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped
> to the internal (10.x.y.z) address of the instance it is attached to.
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address
> that the elastic IP hostname gets mapped to and reconnect appropriately.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)