zhihai xu created ZOOKEEPER-2219:
------------------------------------

             Summary: ZooKeeper server should better handle 
SessionMovedException.
                 Key: ZOOKEEPER-2219
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.4.5
            Reporter: zhihai xu


ZooKeeper server should better handle SessionMovedException.
We hit the SessionMovedException. the following is the reason for the 
SessionMovedException we find:
1. ZK client tried to connect to Leader L. Network was very slow, so before 
leader processed the request, client disconnected.
2. Client then re-connected to Follower F reusing the same session ID. It was 
successful.
3. The request in step 1 went into leader. Leader processed it and invalidated 
the connection created in step 2. But client didn't know the connection it used 
is invalidated.
4. Client got SessionMovedException when it used the connection invalidated by 
leader for any ZooKeeper operation.

The following are logs: c045dkh is the Leader, c470udy is the Follower and the 
sessionID is 0x14be28f50f4419d.
1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and 
timeout after 10/3 seconds.
logs from ZK client 
{code}
2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 6670ms for sessionid 
0x14be28f50f4419d, closing socket connection and attempting reconnect
015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to c045dkh/?.?.?.67:2181, initiating session
2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 3336ms for sessionid 
0x14be28f50f4419d, closing socket connection and attempting reconnect
{code}

2. ZK client initiated session to Follower successfully at 2015-03-16 
10:59:44,688
logs from ZK client
{code}
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to c470udy/?.?.?.65:2181, initiating session
2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server c470udy/?.?.?.65:2181, sessionid = 
0x14be28f50f4419d, negotiated timeout = 10000
{code}
logs from ZK Follower server
{code}
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: 
Accepted socket connection from /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: 
Revalidating client: 0x14be28f50f4419d
2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client 
/?.?.?.65:42777
{code}

3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is 
sent by Client at 2015-03-16 10:59:44,688, right after session was established, 
it close the socket connection/session because client was already disconnected 
due to timeout.
logs from ZK Leader server
{code}
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client 
/?.?.?.65:50271
2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: 
Exception causing close of session 0x14be28f50f4419d due to 
java.io.IOException: Broken pipe
2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /?.?.?.65:50271 which had sessionid 
0x14be28f50f4419d
{code}

4. Client got SessionMovedException at 2015-03-16 10:59:45,693
logs from ZK Leader server
{code}
2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
{code}

5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the 
socket connection between ZK client and ZK Follower is closed at 2015-03-16 
10:59:45,715 after session termination.
logs from ZK Leader server:
{code}
2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination for sessionid: 0x14be28f50f4419d
{code}
logs from ZK Follower server:
{code}
2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /?.?.?.65:42777 which had sessionid 
0x14be28f50f4419d
{code}

It looks like Zk client is out-of-sync with ZK server.
My question is how ZK client can recover from this error. It looks like the ZK 
Client won't be disconnected from Follower until session is closed and any 
ZooKeeper operation will fail with SessionMovedException before session is 
closed.
Also since ZK Leader already closed the socket connection/session to ZK Client 
at step 3, why it still reject the ZooKeeper operation from client with 
SessionMovedException. Will it be better to endorse the session/connection 
between ZK client and ZK Follower? This seems like a bug to me. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to