Zoltan Szekeres created CURATOR-367:
---------------------------------------

             Summary: Curator may deliver RECONNECTED before LOST in case of 
session expiry
                 Key: CURATOR-367
                 URL: https://issues.apache.org/jira/browse/CURATOR-367
             Project: Apache Curator
          Issue Type: Bug
          Components: Client
    Affects Versions: 2.11.1
            Reporter: Zoltan Szekeres


h2. Behaviour:
We saw our code blocked at client.blockUntilConnected() after reconnected on 
session expiry.

h2. Possible reason:
After receiving a session expired event ConnectionState first resets the 
connection then notifies the parent watchers, where the CuratorEvent is 
created. In this case it seems the execution of the first zookeeper event 
thread was delayed before calling the parent watchers. Meanwhile a new 
zookeeper event thread was created due to calling reset and this new thread 
sent the SyncConnected event earlier than SessionExpired was sent to parent 
watchers in the first thread. This resulted ConnectionStateListener instances 
seeing the RECONNECTED before the LOST.

h2. Logs:
2016-11-17T20:23:28.527Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Opening socket connection to server _
2016-11-17T20:23:28.535Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Socket connection established to _, initiating session
2016-11-17T20:23:28.576Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Unable to reconnect to ZooKeeper service, session 0xc585ba1e7b6adc2 has 
expired, closing socket connection
2016-11-17T20:23:28.576Z [Thread-0-EventThread]  WARN         ConnectionState: 
Session expired event received
2016-11-17T20:23:28.673Z [Thread-0-EventThread]  INFO               ZooKeeper: 
Initiating client connection, connectString=_ sessionTimeout=30000 
watcher=org.apache.curator.ConnectionState@6ddf3f9d
2016-11-17T20:23:28.691Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Opening socket connection to server _
2016-11-17T20:23:28.693Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Socket connection established to _, initiating session
2016-11-17T20:23:28.701Z [Thread-0-SendThread(]  INFO              ClientCnxn: 
Session establishment complete on server _, sessionid = 0x2585ba1e69ffeca, 
negotiated timeout = 30000
2016-11-17T20:23:28.701Z [Thread-0-EventThread]  INFO  ConnectionStateManager: 
State change: RECONNECTED
2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO  ConnectionStateManager: 
State change: LOST
2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO              ClientCnxn: 
EventThread shut down

h2. Reproduction:
I was only able to reproduce the behaviour by adding artificial Thread.sleep in 
ConnectionState#process before calling the parent watchers if the event is 
session expired.
{code:title=ConnectionState#process}
@Override
public void process(WatchedEvent event)
{
        if ( LOG_EVENTS )
        {
                log.debug("ConnectState watcher: " + event);
        }

        if ( event.getType() == Watcher.Event.EventType.None )
        {
                boolean wasConnected = isConnected.get();
                boolean newIsConnected = checkState(event.getState(), 
wasConnected);
                if ( newIsConnected != wasConnected )
                {
                        isConnected.set(newIsConnected);
                        connectionStartMs = System.currentTimeMillis();
                }
        }

        if (event.getState() == KeeperState.Expired)
        {
                System.err.println("::> sleep in ConnectionState#process");
                try {
                        Thread.sleep(1000);
                } catch (InterruptedException e) {}
        }

        for ( Watcher parentWatcher : parentWatchers )
        {
                TimeTrace timeTrace = new 
TimeTrace("connection-state-parent-process", tracer.get());
                parentWatcher.process(event);
                timeTrace.commit();
        }
}
{code}

h2. Some ideas for fix:
 * Add the event handling and calling parent watchers into a synchronized block.
 * Change the order of handling watched event and calling parent watchers (I'm 
not aware of the behaviour implications of this).
 * Move only calling reset to the end of the method "process".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to