[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667946#comment-16667946
 ] 

yangwei commented on ZOOKEEPER-2219:
------------------------------------

https://issues.apache.org/jira/browse/ZOOKEEPER-2886 fixed

> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2219
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: zhihai xu
>            Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the 
> SessionMovedException we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before 
> leader processed the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was 
> successful.
> 3. The request in step 1 went into leader. Leader processed it and 
> invalidated the connection created in step 2. But client didn't know the 
> connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated 
> by leader for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and 
> the sessionID is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and 
> timeout after 10/3 seconds.
> logs from ZK client 
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 6670ms for sessionid 
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 3336ms for sessionid 
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16 
> 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server c470udy/?.?.?.65:2181, sessionid = 
> 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection 
> from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: 
> Revalidating client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for 
> client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is 
> sent by Client at 2015-03-16 10:59:40,735, right after session was 
> established, it close the socket connection/session because client was 
> already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for 
> client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x14be28f50f4419d due to 
> java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> Closed socket connection for client /?.?.?.65:50271 which had sessionid 
> 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,695 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,700 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the 
> socket connection between ZK client and ZK Follower is closed at 2015-03-16 
> 10:59:45,715 after session termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session 
> termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> Closed socket connection for client /?.?.?.65:42777 which had sessionid 
> 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the 
> ZK Client won't be disconnected from Follower until session is closed and any 
> ZooKeeper operation will fail with SessionMovedException before session is 
> closed.
> Also since ZK Leader already closed the socket connection/session to ZK 
> Client at step 3, why it still reject the ZooKeeper operation from client 
> with SessionMovedException. Will it be better to endorse the 
> session/connection between ZK client and ZK Follower? This seems like a bug 
> to me. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to