[
https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nabarun resolved GEODE-4096.
----------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
> Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper
> thread and the _dispatchBatch method for the connection global variable.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-4096
> URL: https://issues.apache.org/jira/browse/GEODE-4096
> Project: Geode
> Issue Type: Bug
> Components: wan
> Reporter: nabarun
> Assignee: nabarun
> Fix For: 1.4.0
>
>
> *+Order of execution for this race condition to occur+*.
> # _dispatchBatch is trying to dispatch a batch of events but was somehow
> unsuccessful
> # It silently decides that the remote server may not be ready so it wants to
> retry
> # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we
> call the Stopper Thread.
> # Before the threads are started on all the senders / dispatchers it sets the
> isStopped flag for the SerialGatewaySenderEventProcessor to true.
> # Then the _dispatchBatch method which was in retry mode makes a
> getConnection call to get the connection. This method does a check on the
> SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is
> set and this return null.
> # This null is stored in the global variable connection for the dispatcher.
> # Now that the _dispatchBatch method calls sees that the connection is null
> it should raise an exception and destroyConnection.
> # Meanwhile there was a AckThreadReader that was running and the stopper
> thread for the event processor wants to stop it, but since the connection
> global variable was set to null by the get connection method call by
> _disptachBatch.
> # Hence the shutDownAckReaderThreadConnection is executed on null and hence
> the AckReaderThread continues to keep running - being stuck on socketRead0.
> # But the problem is that the AckReaderThread acquire a
> connectionLifeCycle.readLock. to readAcknowledgement, but the
> destroyConnection calls from the stopper thread and _dispatchBatch's
> exception handling code needs a connectionLifeCycleLock.writeLock which they
> can't because readLock is held by the AckReaderThread, causing a deadlock
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)