I added a stack trace to the closing of both GemFireCacheImpl and
InternalDistributedSystem and found a difference.

The test passes when it's the test thread doing the close:

java.lang.Throwable: KIRK GemFireCacheImpl closed 1046056441
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902)
        at
org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56)
java.lang.Throwable: KIRK InternalDistributedSystem closed 1311844206
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637)
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902)
        at
org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56)

When the test fails and reproduces the problem, the close is apparently
completed by a different background thread:

java.lang.Throwable: KIRK GemFireCacheImpl closed 277876155
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
        at
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
        at java.lang.Thread.run(Thread.java:748)
java.lang.Throwable: KIRK InternalDistributedSystem closed 306674056
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637)
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
        at
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
        at java.lang.Thread.run(Thread.java:748)

On Tue, Nov 26, 2019 at 9:20 AM Kirk Lund <kl...@apache.org> wrote:

> Seems like this must be a bug, so I filed
> https://issues.apache.org/jira/browse/GEODE-7503. I'll look into it...
>
> On Mon, Nov 25, 2019 at 3:24 PM Anilkumar Gingade <aging...@pivotal.io>
> wrote:
>
>> Looking at the code, the cache.close() and InternalCacheBuilder.create()
>> are synchronized on "GemFireCacheImpl.class"'; it's the
>> internalCachebuilder create that seems to be using reference to the old
>> distributed-system.
>> The GemFireCacheImpl.getInstance() and getExisting() both perform
>> "isClosing" check and does early return. The InternalCacheBuilder is new;
>> not sure if its missing early checks.
>>
>> -Anil.
>>
>> On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mhan...@pivotal.io> wrote:
>>
>> > +1 to fix.
>> >
>> > > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
>> > >
>> > > +1 ^ 64!
>> > >
>> > > I found this out the hard way some time ago and is why STDG exists in
>> the
>> > > first place (i.e. usability issues, particularly with testing).
>> > >
>> > > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
>> > >
>> > >> I found a test that closes the cache and then recreates the cache
>> > multiple
>> > >> times with 2 second sleep between each. I tried to remove the
>> > Thread.sleep
>> > >> and found that recreating the cache
>> > >> throws DistributedSystemDisconnectedException (see below).
>> > >>
>> > >> This seems like a usability nightmare. Anyone have any ideas WHY it's
>> > this
>> > >> way?
>> > >>
>> > >> Personally, I want Cache.close() to block until both Cache and
>> > >> DistributedSystem are closed and the API is ready to create a new
>> Cache.
>> > >>
>> > >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
>> > This
>> > >> connection to a distributed system has been disconnected.
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
>> > >>        at
>> > >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
>> > >>
>> > >
>> > >
>> > > --
>> > > -John
>> > > john.blum10101 (skype)
>> >
>> >
>>
>

Reply via email to