[ 
https://issues.apache.org/jira/browse/CURATOR-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118530#comment-15118530
 ] 

huanhuan li commented on CURATOR-293:
-------------------------------------

Hi Jordan Zimmerman:
I would make the problem more clearly, When curator client reconnect to ZK, it 
finds that the session has expired, then it reset and recreate another 
connection, but mean while the DNS exception was thrown, and the new connection 
recreation failed. Then, old connection close and new connection create failed. 
No watcher on ZK now, and the connection to ZK can never recover. 

I think curator client can do something to solve the problem by attempt 
reconnect again when the connection recreation failed.

> Curator can NOT reconnect after connection lost and session expired when the 
> connection come up while the DNS server is not ready yet.(zookeeper 
> connection string using domain names)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-293
>                 URL: https://issues.apache.org/jira/browse/CURATOR-293
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.9.1
>            Reporter: huanhuan li
>            Priority: Critical
>         Attachments: CuratorConnectionLostEventTest.java
>
>
> 1. Add following lines to the /etc/hosts:
> x.x.x.x zk1.test.com
> x.x.x.x  zk2.test.com
> x.x.x.x  zk3.test.com
> 2. RUN the test programme
> 3. shutdown the network connection to x.x.x.x
> 4. wait until the session expires (for example 10 min)
> 5. remove the added 3 lines in /etc/hosts
> 6. open the network connection to x.x.x.x
> 7. watch that curator cannot reconnect
> 8. add the 3 lines to /etc/hosts
> 9. watch that curator cannot reconnect either
> The log may look like the following:
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.005 
> [ClientCnxn.logStartConnect] - Opening socket connection to server 
> 172.24.2.35/172.24.2.35:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.050 
> [ClientCnxn.primeConnection] - Socket connection established to 
> 172.24.2.35/172.24.2.35:2181, initiating session
> [main-EventThread][WARN ]2016-01-26 11:07:45.093 
> [ConnectionState.handleExpiredSession] - Session expired event received
> [main-EventThread][DEBUG]2016-01-26 11:07:45.093 [ConnectionState.reset] - 
> reset
> [main-SendThread(172.24.2.35:2181)][INFO ]2016-01-26 11:07:45.093 
> [ClientCnxn.run] - Unable to reconnect to ZooKeeper service, session 
> 0x1525d9593a537af has expired, closing socket connection
> [main-EventThread][INFO ]2016-01-26 11:07:45.095 [ZooKeeper.<init>] - 
> Initiating client connection, 
> connectString=zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 
> sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@7e7d611f
> [main-EventThread][INFO ]2016-01-26 11:07:45.488 [ClientCnxn.run] - 
> EventThread shut down
> [main-SendThread(111.206.227.147:2181)][INFO ]2016-01-26 11:07:45.615 
> [ClientCnxn.logStartConnect] - Opening socket connection to server 
> 111.206.227.147/111.206.227.147:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> [Curator-ConnectionStateManager-0][DEBUG]2016-01-26 11:07:58.523 
> [CuratorZookeeperClient.blockUntilConnectedOrTimedOut] - 
> blockUntilConnectedOrTimedOut() end. isConnected: false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to