[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1506:
---------------------------------------
    Description: 
   In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
an ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP 
address they map to can and does change. Our procedure for replacing/upgrading 
a ZK node is to boot an entirely new instance and remap the hostname to the new 
instance's IP address. Our expectation is that when the original ZK node is 
terminated/shutdown, the remaining nodes in the ensemble would reconnect to the 
new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to 
re-resolve the hostname->IP mapping for the new server. Once the original ZK 
node is terminated, the existing servers continue to attempt contacting it at 
the old IP address. It would be great if the ZK servers could try to re-resolve 
the hostname when attempting to connect to a lost ZK server, instead of caching 
the lookup indefinitely. Currently we must do a rolling restart of the ZK 
ensemble after swapping a node -- which at three nodes means we periodically 
lose quorum.

The exact method we are following is to boot new instances in EC2 and attach 
one, of a set of three, Elastic IP address. External to EC2 this IP address 
remains the same and maps to whatever instance it is attached to. Internal to 
EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
to the internal (10.x.y.z) address of the instance it is attached to. 
Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that 
the elastic IP hostname gets mapped to and reconnect appropriately.

  was:
In our zoo.cfg we use hostnames to identify the ZK servers that are part of an 
ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP 
address they map to can and does change. Our procedure for replacing/upgrading 
a ZK node is to boot an entirely new instance and remap the hostname to the new 
instance's IP address. Our expectation is that when the original ZK node is 
terminated/shutdown, the remaining nodes in the ensemble would reconnect to the 
new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to 
re-resolve the hostname->IP mapping for the new server. Once the original ZK 
node is terminated, the existing servers continue to attempt contacting it at 
the old IP address. It would be great if the ZK servers could try to re-resolve 
the hostname when attempting to connect to a lost ZK server, instead of caching 
the lookup indefinitely. Currently we must do a rolling restart of the ZK 
ensemble after swapping a node -- which at three nodes means we periodically 
lose quorum.

The exact method we are following is to boot new instances in EC2 and attach 
one, of a set of three, Elastic IP address. External to EC2 this IP address 
remains the same and maps to whatever instance it is attached to. Internal to 
EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
to the internal (10.x.y.z) address of the instance it is attached to. 
Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that 
the elastic IP hostname gets mapped to and reconnect appropriately.


> Re-try DNS hostname -> IP resolution if node connection fails
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1506
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Ubuntu 11.04 64-bit
>            Reporter: Mike Heffner
>            Assignee: Michi Mutsuzaki
>            Priority: Critical
>              Labels: patch
>             Fix For: 3.4.7, 3.5.1, 3.6.0
>
>         Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> zk-dns-caching-refresh.patch
>
>
>    In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to