[jira] [Resolved] (GEODE-4096) Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper thread and the _dispatchBatch method for the connection global variable.

nabarun (JIRA) Thu, 04 Jan 2018 14:03:23 -0800

     [ 
https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


nabarun resolved GEODE-4096.
----------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

> Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper 
> thread and the _dispatchBatch method for the connection global variable.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4096
>                 URL: https://issues.apache.org/jira/browse/GEODE-4096
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: nabarun
>            Assignee: nabarun
>             Fix For: 1.4.0
>
>
> *+Order of execution for this race condition to occur+*.
> #  _dispatchBatch is trying to dispatch a batch of events but was somehow 
> unsuccessful 
> # It silently decides that the remote server may not be ready so it wants to 
> retry
> # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we 
> call the Stopper Thread.
> # Before the threads are started on all the senders / dispatchers it sets the 
> isStopped flag for the SerialGatewaySenderEventProcessor to true.
> # Then the _dispatchBatch method which was in retry mode makes a 
> getConnection call to get the connection. This method does a check on the 
> SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is 
> set and this return null.
> # This null is stored in the global variable connection for the dispatcher.
> # Now that the _dispatchBatch method calls sees that the connection is null 
> it should raise an exception and destroyConnection.
> # Meanwhile there was a AckThreadReader that was running and the stopper 
> thread for the event processor wants to stop it, but since the connection 
> global variable was set to null by the get connection method call by 
> _disptachBatch.
> # Hence the shutDownAckReaderThreadConnection is executed on null and hence 
> the AckReaderThread continues to keep running - being stuck on socketRead0.
> # But the problem is that the AckReaderThread acquire a 
> connectionLifeCycle.readLock. to readAcknowledgement, but the 
> destroyConnection calls from the stopper thread and _dispatchBatch's 
> exception handling code needs a connectionLifeCycleLock.writeLock which they 
> can't because readLock is held by the AckReaderThread, causing a deadlock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (GEODE-4096) Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper thread and the _dispatchBatch method for the connection global variable.

Reply via email to