[ 
https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888982#comment-15888982
 ] 

ASF GitHub Bot commented on CURATOR-367:
----------------------------------------

Github user Randgalt commented on a diff in the pull request:

    https://github.com/apache/curator/pull/197#discussion_r103568028
  
    --- Diff: 
curator-client/src/main/java/org/apache/curator/ConnectionState.java ---
    @@ -283,12 +300,19 @@ private boolean checkState(Event.KeeperState state, 
boolean wasConnected)
                 new EventTrace(state.toString(), tracer.get(), 
getSessionId()).commit();
             }
     
    -        if ( checkNewConnectionString && 
zooKeeper.hasNewConnectionString() )
    +        return isConnected;
    +    }
    +
    +    private void handleState(Event.KeeperState state)
    +    {
    +        if (state == Event.KeeperState.Expired)
    +        {
    +            handleExpiredSession();
    --- End diff --
    
    Oh - I think you're right. The old version set 
`checkNewConnectionString=false`


> Curator may deliver RECONNECTED before LOST in case of session expiry
> ---------------------------------------------------------------------
>
>                 Key: CURATOR-367
>                 URL: https://issues.apache.org/jira/browse/CURATOR-367
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.1
>            Reporter: Zoltan Szekeres
>
> h2. Behaviour:
> We saw our code blocked at client.blockUntilConnected() after reconnected on 
> session expiry.
> h2. Possible reason:
> After receiving a session expired event ConnectionState first resets the 
> connection then notifies the parent watchers, where the CuratorEvent is 
> created. In this case it seems the execution of the first zookeeper event 
> thread was delayed before calling the parent watchers. Meanwhile a new 
> zookeeper event thread was created due to calling reset and this new thread 
> sent the SyncConnected event earlier than SessionExpired was sent to parent 
> watchers in the first thread. This resulted ConnectionStateListener instances 
> seeing the RECONNECTED before the LOST.
> h2. Logs:
> 2016-11-17T20:23:28.527Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.535Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.576Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Unable to reconnect to ZooKeeper service, session 
> 0xc585ba1e7b6adc2 has expired, closing socket connection
> 2016-11-17T20:23:28.576Z [Thread-0-EventThread]  WARN         
> ConnectionState: Session expired event received
> 2016-11-17T20:23:28.673Z [Thread-0-EventThread]  INFO               
> ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000 
> watcher=org.apache.curator.ConnectionState@6ddf3f9d
> 2016-11-17T20:23:28.691Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.693Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.701Z [Thread-0-SendThread(]  INFO              
> ClientCnxn: Session establishment complete on server _, sessionid = 
> 0x2585ba1e69ffeca, negotiated timeout = 30000
> 2016-11-17T20:23:28.701Z [Thread-0-EventThread]  INFO  
> ConnectionStateManager: State change: RECONNECTED
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO  
> ConnectionStateManager: State change: LOST
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO              
> ClientCnxn: EventThread shut down
> h2. Reproduction:
> I was only able to reproduce the behaviour by adding artificial Thread.sleep 
> in ConnectionState#process before calling the parent watchers if the event is 
> session expired.
> {code:title=ConnectionState#process}
> @Override
> public void process(WatchedEvent event)
> {
>       if ( LOG_EVENTS )
>       {
>               log.debug("ConnectState watcher: " + event);
>       }
>       if ( event.getType() == Watcher.Event.EventType.None )
>       {
>               boolean wasConnected = isConnected.get();
>               boolean newIsConnected = checkState(event.getState(), 
> wasConnected);
>               if ( newIsConnected != wasConnected )
>               {
>                       isConnected.set(newIsConnected);
>                       connectionStartMs = System.currentTimeMillis();
>               }
>       }
>       if (event.getState() == KeeperState.Expired)
>       {
>               System.err.println("::> sleep in ConnectionState#process");
>               try {
>                       Thread.sleep(1000);
>               } catch (InterruptedException e) {}
>       }
>       for ( Watcher parentWatcher : parentWatchers )
>       {
>               TimeTrace timeTrace = new 
> TimeTrace("connection-state-parent-process", tracer.get());
>               parentWatcher.process(event);
>               timeTrace.commit();
>       }
> }
> {code}
> h2. Some ideas for fix:
>  * Add the event handling and calling parent watchers into a synchronized 
> block.
>  * Change the order of handling watched event and calling parent watchers 
> (I'm not aware of the behaviour implications of this).
>  * Move only calling reset to the end of the method "process".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to