Barry Oglesby created GEODE-1172:
------------------------------------
Summary: Not all GatewaySenders are closed when a
ForcedDisconnectException occurs
Key: GEODE-1172
URL: https://issues.apache.org/jira/browse/GEODE-1172
Project: Geode
Issue Type: Bug
Components: wan
Reporter: Barry Oglesby
When the cache is closed due to {{ForcedDisconnectException}}, only the first
{{GatewaySender}} is stopped. Other {{GatewaySenders}} are not stopped because
a {{ForcedDisconnectException}} is thrown while stopping the first one,
short-circuiting the rest.
If there are two {{GatewaySenders}} configured like:
{noformat}
<gateway-sender id="ny" remote-distributed-system-id="2"... />
<async-event-queue id="db" ... >
{noformat}
When the {{Cache}} was closed, only the {{GatewaySender}} underlying the db
{{AsyncEventQueue}} and its corresponding {{BatchRemovalThreads}} are closed:
{noformat}
[info 2016/04/04 17:40:45.844 PDT gateway-ln-1 <ReconnectThread> tid=0x67]
Stopped SerialGatewaySender{id=AsyncEventQueue_db,remoteDsId=-1,isRunning
=false,isPrimary =true}
[info 2016/04/04 17:40:45.845 PDT gateway-ln-1 <Thread-17> tid=0x51] The
QueueRemovalThread is done.
[info 2016/04/04 17:40:45.845 PDT gateway-ln-1 <Thread-18> tid=0x53] The
QueueRemovalThread is done.
[info 2016/04/04 17:40:45.845 PDT gateway-ln-1 <Thread-19> tid=0x55] The
QueueRemovalThread is done.
[info 2016/04/04 17:40:45.845 PDT gateway-ln-1 <Thread-20> tid=0x57] The
QueueRemovalThread is done.
[info 2016/04/04 17:40:45.846 PDT gateway-ln-1 <Thread-16> tid=0x4f] The
QueueRemovalThread is done.
{noformat}
The ln {{GatewaySender}} was not closed (there was no logging for it).
These {{ForcedDisconnectExceptions}} are caught attempting to stop the
{{GatewaySenders}} when the member is forced out of the {{DistributedSystem}}:
{noformat}
[warning 2016/04/04 17:40:45.846 PDT gateway-ln-1 <ReconnectThread> tid=0x67]
com.gemstone.gemfire.distributed.DistributedSystemDisconnectedException: This
connection to a distributed system has been disconnected., caused by
com.gemstone.gemfire.ForcedDisconnectException: for testing
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:827)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1421)
at
com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.getDistributionManager(AbstractGatewaySender.java:449)
at
com.gemstone.gemfire.internal.cache.UpdateAttributesProcessor.sendProfileUpdate(UpdateAttributesProcessor.java:122)
at
com.gemstone.gemfire.internal.cache.UpdateAttributesProcessor.distribute(UpdateAttributesProcessor.java:98)
at
com.gemstone.gemfire.cache.asyncqueue.internal.SerialAsyncEventQueueImpl.stop(SerialAsyncEventQueueImpl.java:171)
at
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2078)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1283)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2663)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2526)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:947)
at
com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.membershipFailure(DistributionManager.java:4381)
at
com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1580)
at
com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$5.run(GMSMembershipManager.java:2615)
at java.lang.Thread.run(Thread.java:745)
[warning 2016/04/04 17:47:19.082 PDT gateway-ln-1 <ReconnectThread> tid=0x68]
com.gemstone.gemfire.distributed.DistributedSystemDisconnectedException: This
connection to a distributed system has been disconnected., caused by
com.gemstone.gemfire.ForcedDisconnectException: for testing
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:827)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1421)
at
com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.getDistributionManager(AbstractGatewaySender.java:449)
at
com.gemstone.gemfire.internal.cache.UpdateAttributesProcessor.sendProfileUpdate(UpdateAttributesProcessor.java:122)
at
com.gemstone.gemfire.internal.cache.UpdateAttributesProcessor.distribute(UpdateAttributesProcessor.java:98)
at
com.gemstone.gemfire.internal.cache.wan.GatewaySenderAdvisor.close(GatewaySenderAdvisor.java:744)
at
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2084)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1283)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2663)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2526)
at
com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:947)
at
com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.membershipFailure(DistributionManager.java:4381)
at
com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1580)
at
com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$5.run(GMSMembershipManager.java:2615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
The fixes are:
- to wrap the call in {{SerialGatewaySenderImpl.stop}} below with
{{CancelException}} handling:
{noformat}
new UpdateAttributesProcessor(this).distribute(false);
{noformat}
- to change the exception handling in {{GemFireCacheImpl.close}} to wrap each
individual call to {{GatewaySender.stop}} rather than the entire 'for' loop.
After these changes, both {{GatewaySenders}} and their corresponding
{{BatchRemovalThreads}} are closed:
{noformat}
[info 2016/04/04 17:53:02.397 PDT gateway-ln-1 <ReconnectThread> tid=0x68]
Stopped SerialGatewaySender{id=AsyncEventQueue_db,remoteDsId=-1,isRunning
=false,isPrimary =true}
[info 2016/04/04 17:53:02.398 PDT gateway-ln-1 <Thread-20> tid=0x57] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.398 PDT gateway-ln-1 <Thread-16> tid=0x4f] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.398 PDT gateway-ln-1 <Thread-17> tid=0x51] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.398 PDT gateway-ln-1 <Thread-18> tid=0x53] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.398 PDT gateway-ln-1 <Thread-19> tid=0x55] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.401 PDT gateway-ln-1 <GatewaySender Proxy Stomper>
tid=0x7c] Destroying connection pool ny
[info 2016/04/04 17:53:02.401 PDT gateway-ln-1 <ReconnectThread> tid=0x68]
Stopped SerialGatewaySender{id=ny,remoteDsId=1,isRunning =false,isPrimary
=true}
[info 2016/04/04 17:53:02.401 PDT gateway-ln-1 <Thread-13> tid=0x44] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.402 PDT gateway-ln-1 <Thread-11> tid=0x40] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.402 PDT gateway-ln-1 <Thread-12> tid=0x42] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.402 PDT gateway-ln-1 <Thread-15> tid=0x48] The
QueueRemovalThread is done.
[info 2016/04/04 17:53:02.402 PDT gateway-ln-1 <Thread-14> tid=0x46] The
QueueRemovalThread is done.
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)