I added a stack trace to the closing of both GemFireCacheImpl and InternalDistributedSystem and found a difference.
The test passes when it's the test thread doing the close: java.lang.Throwable: KIRK GemFireCacheImpl closed 1046056441 at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902) at org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56) java.lang.Throwable: KIRK InternalDistributedSystem closed 1311844206 at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902) at org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56) When the test fails and reproduces the problem, the close is apparently completed by a different background thread: java.lang.Throwable: KIRK GemFireCacheImpl closed 277876155 at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917) at org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380) at java.lang.Thread.run(Thread.java:748) java.lang.Throwable: KIRK InternalDistributedSystem closed 306674056 at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917) at org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380) at java.lang.Thread.run(Thread.java:748) On Tue, Nov 26, 2019 at 9:20 AM Kirk Lund <kl...@apache.org> wrote: > Seems like this must be a bug, so I filed > https://issues.apache.org/jira/browse/GEODE-7503. I'll look into it... > > On Mon, Nov 25, 2019 at 3:24 PM Anilkumar Gingade <aging...@pivotal.io> > wrote: > >> Looking at the code, the cache.close() and InternalCacheBuilder.create() >> are synchronized on "GemFireCacheImpl.class"'; it's the >> internalCachebuilder create that seems to be using reference to the old >> distributed-system. >> The GemFireCacheImpl.getInstance() and getExisting() both perform >> "isClosing" check and does early return. The InternalCacheBuilder is new; >> not sure if its missing early checks. >> >> -Anil. >> >> On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mhan...@pivotal.io> wrote: >> >> > +1 to fix. >> > >> > > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote: >> > > >> > > +1 ^ 64! >> > > >> > > I found this out the hard way some time ago and is why STDG exists in >> the >> > > first place (i.e. usability issues, particularly with testing). >> > > >> > > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote: >> > > >> > >> I found a test that closes the cache and then recreates the cache >> > multiple >> > >> times with 2 second sleep between each. I tried to remove the >> > Thread.sleep >> > >> and found that recreating the cache >> > >> throws DistributedSystemDisconnectedException (see below). >> > >> >> > >> This seems like a usability nightmare. Anyone have any ideas WHY it's >> > this >> > >> way? >> > >> >> > >> Personally, I want Cache.close() to block until both Cache and >> > >> DistributedSystem are closed and the API is ready to create a new >> Cache. >> > >> >> > >> org.apache.geode.distributed.DistributedSystemDisconnectedException: >> > This >> > >> connection to a distributed system has been disconnected. >> > >> at >> > >> >> > >> >> > >> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945) >> > >> at >> > >> >> > >> >> > >> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665) >> > >> at >> > >> >> > >> >> > >> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791) >> > >> at >> > >> >> > >> >> > >> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187) >> > >> at >> > >> >> > >> >> > >> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158) >> > >> at >> > >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142) >> > >> >> > > >> > > >> > > -- >> > > -John >> > > john.blum10101 (skype) >> > >> > >> >