[
https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667946#comment-16667946
]
yangwei commented on ZOOKEEPER-2219:
------------------------------------
https://issues.apache.org/jira/browse/ZOOKEEPER-2886 fixed
> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-2219
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.5
> Reporter: zhihai xu
> Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the
> SessionMovedException we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before
> leader processed the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was
> successful.
> 3. The request in step 1 went into leader. Leader processed it and
> invalidated the connection created in step 2. But client didn't know the
> connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated
> by leader for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and
> the sessionID is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and
> timeout after 10/3 seconds.
> logs from ZK client
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 6670ms for sessionid
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 3336ms for sessionid
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16
> 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server c470udy/?.?.?.65:2181, sessionid =
> 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO
> org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection
> from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner:
> Revalidating client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for
> client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is
> sent by Client at 2015-03-16 10:59:40,735, right after session was
> established, it close the socket connection/session because client was
> already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for
> client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn:
> Exception causing close of session 0x14be28f50f4419d due to
> java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /?.?.?.65:50271 which had sessionid
> 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,695 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,700 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the
> socket connection between ZK client and ZK Follower is closed at 2015-03-16
> 10:59:45,715 after session termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /?.?.?.65:42777 which had sessionid
> 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the
> ZK Client won't be disconnected from Follower until session is closed and any
> ZooKeeper operation will fail with SessionMovedException before session is
> closed.
> Also since ZK Leader already closed the socket connection/session to ZK
> Client at step 3, why it still reject the ZooKeeper operation from client
> with SessionMovedException. Will it be better to endorse the
> session/connection between ZK client and ZK Follower? This seems like a bug
> to me.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)