Jonathan Park created ZOOKEEPER-3280:
----------------------------------------

             Summary: ClientCnxn xid rollover can break sessions
                 Key: ZOOKEEPER-3280
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
    Affects Versions: 3.4.12, 3.4.6
            Reporter: Jonathan Park
         Attachments: ZKOutOfOrder.java

 
{code:java}
181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 
for a packet with details: clientPath:null serverPath:null finished:false 
header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} 
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2019-02-15 13:40:22,520 [myid:] - INFO 
[main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket 
connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate 
using SASL (unknown error)
2019-02-15 13:40:22,521 [myid:] - INFO 
[main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket connection 
established to localhost/127.0.0.1:2181, initiating session
2019-02-15 13:40:22,521 [myid:] - DEBUG 
[main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session 
establishment request sent on localhost/127.0.0.1:2181
2019-02-15 13:40:22,522 [myid:] - INFO 
[main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session 
establishment complete on server localhost/127.0.0.1:2181, sessionid = 
0x168f2c5e9c60017, negotiated timeout = 30000
2019-02-15 13:40:22,525 [myid:] - DEBUG 
[main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping response 
for sessionid: 0x168f2c5e9c60017 after 235329552ms
{code}
ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this can 
lead to rollover into the negative xid space. Xid = -4 is treated as a special 
xid reserved for auth requests. With xid rollover, a normal ZK request can also 
have xid = -4 but the response will be treated as an auth response making 
subsequent packet processing error with the exception above. We can reproduce 
this more readily by changing the starting xid in ClientCnxn from 1 to -100. 
The ZK client will transparently reconnect and establish a new session but 
features that depended on the same session persisting will unnecessarily 
experience a disconnected event.

 

I've attached simple class with a main() method that reproduces the failure 
quickly against a local ZK server after modifying the initial value of 
ClientCnxn.xid from 1 to -100. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to