Zoltan Szekeres created CURATOR-367:
---------------------------------------
Summary: Curator may deliver RECONNECTED before LOST in case of
session expiry
Key: CURATOR-367
URL: https://issues.apache.org/jira/browse/CURATOR-367
Project: Apache Curator
Issue Type: Bug
Components: Client
Affects Versions: 2.11.1
Reporter: Zoltan Szekeres
h2. Behaviour:
We saw our code blocked at client.blockUntilConnected() after reconnected on
session expiry.
h2. Possible reason:
After receiving a session expired event ConnectionState first resets the
connection then notifies the parent watchers, where the CuratorEvent is
created. In this case it seems the execution of the first zookeeper event
thread was delayed before calling the parent watchers. Meanwhile a new
zookeeper event thread was created due to calling reset and this new thread
sent the SyncConnected event earlier than SessionExpired was sent to parent
watchers in the first thread. This resulted ConnectionStateListener instances
seeing the RECONNECTED before the LOST.
h2. Logs:
2016-11-17T20:23:28.527Z [Thread-0-SendThread(] INFO ClientCnxn:
Opening socket connection to server _
2016-11-17T20:23:28.535Z [Thread-0-SendThread(] INFO ClientCnxn:
Socket connection established to _, initiating session
2016-11-17T20:23:28.576Z [Thread-0-SendThread(] INFO ClientCnxn:
Unable to reconnect to ZooKeeper service, session 0xc585ba1e7b6adc2 has
expired, closing socket connection
2016-11-17T20:23:28.576Z [Thread-0-EventThread] WARN ConnectionState:
Session expired event received
2016-11-17T20:23:28.673Z [Thread-0-EventThread] INFO ZooKeeper:
Initiating client connection, connectString=_ sessionTimeout=30000
watcher=org.apache.curator.ConnectionState@6ddf3f9d
2016-11-17T20:23:28.691Z [Thread-0-SendThread(] INFO ClientCnxn:
Opening socket connection to server _
2016-11-17T20:23:28.693Z [Thread-0-SendThread(] INFO ClientCnxn:
Socket connection established to _, initiating session
2016-11-17T20:23:28.701Z [Thread-0-SendThread(] INFO ClientCnxn:
Session establishment complete on server _, sessionid = 0x2585ba1e69ffeca,
negotiated timeout = 30000
2016-11-17T20:23:28.701Z [Thread-0-EventThread] INFO ConnectionStateManager:
State change: RECONNECTED
2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO ConnectionStateManager:
State change: LOST
2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO ClientCnxn:
EventThread shut down
h2. Reproduction:
I was only able to reproduce the behaviour by adding artificial Thread.sleep in
ConnectionState#process before calling the parent watchers if the event is
session expired.
{code:title=ConnectionState#process}
@Override
public void process(WatchedEvent event)
{
if ( LOG_EVENTS )
{
log.debug("ConnectState watcher: " + event);
}
if ( event.getType() == Watcher.Event.EventType.None )
{
boolean wasConnected = isConnected.get();
boolean newIsConnected = checkState(event.getState(),
wasConnected);
if ( newIsConnected != wasConnected )
{
isConnected.set(newIsConnected);
connectionStartMs = System.currentTimeMillis();
}
}
if (event.getState() == KeeperState.Expired)
{
System.err.println("::> sleep in ConnectionState#process");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {}
}
for ( Watcher parentWatcher : parentWatchers )
{
TimeTrace timeTrace = new
TimeTrace("connection-state-parent-process", tracer.get());
parentWatcher.process(event);
timeTrace.commit();
}
}
{code}
h2. Some ideas for fix:
* Add the event handling and calling parent watchers into a synchronized block.
* Change the order of handling watched event and calling parent watchers (I'm
not aware of the behaviour implications of this).
* Move only calling reset to the end of the method "process".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)