[ 
https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855024#comment-15855024
 ] 

ASF GitHub Bot commented on CURATOR-367:
----------------------------------------

Github user cammckenzie commented on the issue:

    https://github.com/apache/curator/pull/197
  
    I've run through some more tests and it all looks good.
    
    My only concern is that we're introducing the Mockito just for this test 
case. Could you move the test case into the curator-client sub package under 
org.apache.curator. Then no reflection would be required to set the debug flag 
and we could remove the mockito dependency?
    
    Do you have an opinion on introducing the Mockito dependency @Randgalt?
    
    In regards to the Curator 2.x and 3.x, don't worry about a separate PR, it 
should be ok to just merge from the Curator 2.x branch into 3.x. 
    



> Curator may deliver RECONNECTED before LOST in case of session expiry
> ---------------------------------------------------------------------
>
>                 Key: CURATOR-367
>                 URL: https://issues.apache.org/jira/browse/CURATOR-367
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.1
>            Reporter: Zoltan Szekeres
>
> h2. Behaviour:
> We saw our code blocked at client.blockUntilConnected() after reconnected on 
> session expiry.
> h2. Possible reason:
> After receiving a session expired event ConnectionState first resets the 
> connection then notifies the parent watchers, where the CuratorEvent is 
> created. In this case it seems the execution of the first zookeeper event 
> thread was delayed before calling the parent watchers. Meanwhile a new 
> zookeeper event thread was created due to calling reset and this new thread 
> sent the SyncConnected event earlier than SessionExpired was sent to parent 
> watchers in the first thread. This resulted ConnectionStateListener instances 
> seeing the RECONNECTED before the LOST.
> h2. Logs:
> 2016-11-17T20:23:28.527Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.535Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.576Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Unable to reconnect to ZooKeeper service, session 
> 0xc585ba1e7b6adc2 has expired, closing socket connection
> 2016-11-17T20:23:28.576Z [Thread-0-EventThread]  WARN         
> ConnectionState: Session expired event received
> 2016-11-17T20:23:28.673Z [Thread-0-EventThread]  INFO               
> ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000 
> watcher=org.apache.curator.ConnectionState@6ddf3f9d
> 2016-11-17T20:23:28.691Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.693Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.701Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Session establishment complete on server _, sessionid = 
> 0x2585ba1e69ffeca, negotiated timeout = 30000
> 2016-11-17T20:23:28.701Z [Thread-0-EventThread]  INFO  
> ConnectionStateManager: State change: RECONNECTED
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO  
> ConnectionStateManager: State change: LOST
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO              
> ClientCnxn: EventThread shut down
> h2. Reproduction:
> I was only able to reproduce the behaviour by adding artificial Thread.sleep 
> in ConnectionState#process before calling the parent watchers if the event is 
> session expired.
> {code:title=ConnectionState#process}
> @Override
> public void process(WatchedEvent event)
> {
>       if ( LOG_EVENTS )
>       {
>               log.debug("ConnectState watcher: " + event);
>       }
>       if ( event.getType() == Watcher.Event.EventType.None )
>       {
>               boolean wasConnected = isConnected.get();
>               boolean newIsConnected = checkState(event.getState(), 
> wasConnected);
>               if ( newIsConnected != wasConnected )
>               {
>                       isConnected.set(newIsConnected);
>                       connectionStartMs = System.currentTimeMillis();
>               }
>       }
>       if (event.getState() == KeeperState.Expired)
>       {
>               System.err.println("::> sleep in ConnectionState#process");
>               try {
>                       Thread.sleep(1000);
>               } catch (InterruptedException e) {}
>       }
>       for ( Watcher parentWatcher : parentWatchers )
>       {
>               TimeTrace timeTrace = new 
> TimeTrace("connection-state-parent-process", tracer.get());
>               parentWatcher.process(event);
>               timeTrace.commit();
>       }
> }
> {code}
> h2. Some ideas for fix:
>  * Add the event handling and calling parent watchers into a synchronized 
> block.
>  * Change the order of handling watched event and calling parent watchers 
> (I'm not aware of the behaviour implications of this).
>  * Move only calling reset to the end of the method "process".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to