[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873730#comment-15873730
 ] 

Flavio Junqueira commented on ZOOKEEPER-2184:
---------------------------------------------

I haven't had much time to work on this issue, but here is my current 
assessment.

This issue seemed easy to fix at first, but it is fairly fundamental with 
respect to how we resolve host names. Currently, we resolve host names when we 
start a client and never resolve it again. This is the cause of the problem 
reported in the issue because in the scenario described, the zookeeper 
container is re-started and changes addresses, which prevents the client from 
connecting to the zookeeper server. 

The proposed patch here tries to re-resolve the hostname every time the client 
fails to connect to the resolved address. It kind of works, but it makes 
{{StaticHostProvider}} a bit messy because the expectation with the current 
wiring is that we won't have to resolve again.

The ideal situation for the problematic scenario is that we resolve the host 
name every time we try to connect to a server, but that would be a fairly 
fundamental change to how we resolve addresses in ZooKeeper. 

I was also looking at the C client and it might get a bit messy too there 
because I don't think we currently keep the association between the host name 
and the resolved address, so we don't really know what to resolve again. It 
might be possible to do it via the canonical name in {{getaddrinfo}}, but I'm 
not sure how that works with windows.

One specific proposal to avoid having clients never finding a server ever again 
without deep changes to the current wiring is to resolve again everything in 
the case the client tries all and none succeeds. That would be a fairly 
straightforward change to both Java and C client, but it would not resolve 
addresses again in the case the a strict subset has changed addresses and at 
least one server is reachable.




> Zookeeper Client should re-resolve hosts when connection attempts fail
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2184
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.4.6, 3.5.0
>         Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>            Reporter: Robert P. Thille
>            Assignee: Flavio Junqueira
>              Labels: easyfix, patch
>             Fix For: 3.5.3, 3.4.11
>
>         Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to