[
https://issues.apache.org/jira/browse/GEODE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606173#comment-16606173
]
Jason Huynh commented on GEODE-5700:
------------------------------------
A race condition between stopping a cache server and a thread volunteering to
become primary can cause this issue. The volunteer for primary usually would
throw a cancel exception but in this case, the cache is not closed/closing at
the moment and so it throws an IllegalStateException instead. This exception
gets logged whereas a cancel exception does not.
The volunteer for primary also must have gotten the list of cache servers prior
to the cache server shutting down. It iterates over this stale list and must
have gotten past the isRunning() check too...
I think the ServerStarterRule we close the cacheServer explicitly without
closing the cache. A few lines later we close the cache. I am not sure why
this is done in two steps ( I believe Cache close should stop all cache servers
)
Going to investigate a bit further but this is my hypothesis for now...
> CI failures from new tests in PartitionedRegionCompactRangeIndexDUnitTest
> -------------------------------------------------------------------------
>
> Key: GEODE-5700
> URL: https://issues.apache.org/jira/browse/GEODE-5700
> Project: Geode
> Issue Type: Improvement
> Reporter: Dan Smith
> Assignee: Jason Huynh
> Priority: Major
> Labels: swat
>
> We are seeing a couple of the new tests in
> PartitionedRegionCompactRangeIndexDUnitTest fail in CI
> {noformat}
> org.apache.geode.cache.query.dunit.PartitionedRegionCompactRangeIndexDUnitTest:
> 2 failures (99.265% success rate)
> |
> .giiWithPersistenceAndStaleDataDueToDeletesShouldHaveEmptyIndexesWithEntrySet:
> 1 failures (99.632% success rate)
> | | Failed build 376 at
> https://concourse.apachegeode-ci.info/teams/staging/pipelines/concourse-staging/jobs/DistributedTest/builds/376
> | .giiWithPersistenceAndStaleDataDueToDeletesShouldHaveEmptyIndexes: 1
> failures (99.632% success rate)
> | | Failed build 499 at
> https://concourse.apachegeode-ci.info/teams/staging/pipelines/concourse-staging/jobs/DistributedTest/builds/499
> {noformat}
> {noformat}
> org.apache.geode.cache.query.dunit.PartitionedRegionCompactRangeIndexDUnitTest
> > giiWithPersistenceAndStaleDataDueToDeletesShouldHaveEmptyIndexes FAILED
> java.lang.AssertionError: Suspicious strings were written to the log
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> -----------------------------------------------------------------------
> Found suspect string in log4j at line 7947
>
> [error 2018/08/30 21:32:07.028 UTC <Pooled Waiting Message Processor 1>
> tid=0x9d6] A bridge server's bind address is only available if it has been
> started
> java.lang.IllegalStateException: A bridge server's bind address is only
> available if it has been started
> at
> org.apache.geode.internal.cache.CacheServerImpl.getExternalAddress(CacheServerImpl.java:415)
> at
> org.apache.geode.internal.cache.CacheServerImpl.getExternalAddress(CacheServerImpl.java:407)
> at
> org.apache.geode.internal.cache.BucketAdvisor.instantiateProfile(BucketAdvisor.java:1690)
> at
> org.apache.geode.distributed.internal.DistributionAdvisor.createProfile(DistributionAdvisor.java:1026)
> at
> org.apache.geode.internal.cache.BucketAdvisor.sendProfileUpdate(BucketAdvisor.java:1651)
> at
> org.apache.geode.internal.cache.BucketAdvisor.acquiredPrimaryLock(BucketAdvisor.java:1196)
> at
> org.apache.geode.internal.cache.BucketAdvisor$VolunteeringDelegate.doVolunteerForPrimary(BucketAdvisor.java:2586)
> at
> org.apache.geode.internal.cache.BucketAdvisor$VolunteeringDelegate$1.run(BucketAdvisor.java:2484)
> at
> org.apache.geode.internal.cache.BucketAdvisor$VolunteeringDelegate$2.run(BucketAdvisor.java:2803)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1136)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:112)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$6$1.run(ClusterDistributionManager.java:882)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)