[
https://issues.apache.org/jira/browse/GEODE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946353#comment-16946353
]
Barrett Oglesby commented on GEODE-7277:
----------------------------------------
Here is some logging the monitoringRegion being destroying as a result of the
cache closing:
{noformat}
[warn 2019/10/07 17:03:27.717 PDT <Shutdown member> tid=0x66] XXX
LocalRegion.basicDestroyRegion region=/_monitoringRegion_192.168.1.4<v1>41001
java.lang.Exception
at
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6132)
at
org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
at
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6127)
at
org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2250)
at
org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
at
org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
at
org.apache.geode.management.internal.FederatingManager.stopManagingActivity(FederatingManager.java:147)
at
org.apache.geode.management.internal.FederatingManager.stopManager(FederatingManager.java:135)
at
org.apache.geode.management.internal.SystemManagementService.close(SystemManagementService.java:243)
at
org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:745)
at
org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:131)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2052)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:599)
at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2098)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1503)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1182)
at
org.apache.geode.management.internal.beans.MemberMBeanBridge.lambda$shutDownMember$0(MemberMBeanBridge.java:810)
at java.base/java.lang.Thread.run(Thread.java:834)
[info 2019/10/07 17:03:27.734 PDT <Shutdown member> tid=0x66] GemFireCache[id =
765950170; isClosing = true; isShutDownAll = false; created = Mon Oct 07
17:01:47 PDT 2019; server = false; copyOnRead = false; lockLease = 120;
lockTimeout = 60]: Now closing.
{noformat}
Here is some logging showing the monitoringRegion being destroyed as a result
of the member departing:
{noformat}
[info 2019/10/07 17:18:51.964 PDT <Pooled High Priority Message Processor 2>
tid=0x5a] Member at 192.168.1.4(server-1:35877)<v1>:41001 gracefully left the
distributed cache: shutdown message received
[warn 2019/10/07 17:18:51.965 PDT <DM-MemberEventInvoker> tid=0x1e] XXX
FederatingManager.removeMember member=192.168.1.4(server-1:35877)<v1>:41001
java.lang.Exception
at
org.apache.geode.management.internal.FederatingManager.removeMember(FederatingManager.java:188)
at
org.apache.geode.management.internal.ManagementMembershipListener.memberDeparted(ManagementMembershipListener.java:57)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:3543)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:3473)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:3462)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:2023)
at
org.apache.geode.distributed.internal.ClusterDistributionManager.access$400(ClusterDistributionManager.java:114)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:2055)
at java.base/java.lang.Thread.run(Thread.java:834)
[warn 2019/10/07 17:18:51.966 PDT <FederatingManager2> tid=0x62] XXX
RemoveMemberTask.run member=192.168.1.4(server-1:35877)<v1>:41001
[warn 2019/10/07 17:18:51.967 PDT <FederatingManager2> tid=0x62] XXX
LocalRegion.basicDestroyRegion region=/_monitoringRegion_192.168.1.4<v1>41001
java.lang.Exception
at
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6132)
at
org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
at
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6127)
at
org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2250)
at
org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
at
org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
at
org.apache.geode.management.internal.FederatingManager.access$000(FederatingManager.java:66)
at
org.apache.geode.management.internal.FederatingManager$RemoveMemberTask.run(FederatingManager.java:317)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}
If the cache close causes the monitoringRegion to be destroyed first, then the
RemoveMemberTask will fail with the RegionDestroyedException.
It looks like these calls in FederatingManager.removeMemberArtifacts need to be
separately wrapped in try/catch blocks:
{noformat}
proxyRegion.localDestroyRegion();
notificationRegion.localDestroyRegion();
{noformat}
> CI failure: DestroyLuceneIndexCommandsDUnitTest >
> testDestroyIndexesWithOneIndexAndRegionInOneMember failed with
> RegionDestroyedException suspicious string
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-7277
> URL: https://issues.apache.org/jira/browse/GEODE-7277
> Project: Geode
> Issue Type: Bug
> Components: jmx
> Reporter: Barrett Oglesby
> Priority: Major
>
> DistributedTestOpenJDK11 #1147
> (https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/1147)
> failed with this RegionDestroyedException suspicious string:
> {noformat}
> org.apache.geode.cache.lucene.internal.cli.DestroyLuceneIndexCommandsDUnitTest
> > testDestroyIndexesWithOneIndexAndRegionInOneMember FAILED
> java.lang.AssertionError: Suspicious strings were written to the log
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> -----------------------------------------------------------------------
> Found suspect string in log4j at line 2012
> [fatal 2019/10/07 21:18:02.334 GMT <FederatingManager4> tid=845] Uncaught
> exception in thread Thread[FederatingManager4,5,RMI Runtime]
> org.apache.geode.cache.RegionDestroyedException:
> org.apache.geode.internal.cache.DistributedRegion[path='/_monitoringRegion_172.17.0.10<v2>41002';scope=DISTRIBUTED_NO_ACK';dataPolicy=REPLICATE]
> at
> org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7293)
> at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6157)
> at
> org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
> at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6128)
> at
> org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2251)
> at
> org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
> at
> org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
> at
> org.apache.geode.management.internal.FederatingManager.access$000(FederatingManager.java:67)
> at
> org.apache.geode.management.internal.FederatingManager$RemoveMemberTask.run(FederatingManager.java:564)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}
> This failure has nothing to do with Lucene.
> Some analysis shows that stopping a member and stopping the locator nearly
> simultaneously can cause this RegionDestroyedException.
> The test output shows:
> The locator starts:
> {noformat}
> [vm0] [info 2019/10/07 21:17:55.174 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Starting DistributionManager
> 172.17.0.10(locator-0:87:locator)<ec><v0>:41000. (took 26 ms)
> {noformat}
> Server1 starts and _monitoringRegion_172.17.0.10<v1>41001 is created for it
> in the locator:
> {noformat}
> [vm1] [info 2019/10/07 21:17:59.681 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Starting DistributionManager 172.17.0.10(server-1:95)<v1>:41001.
> (took 405 ms)
> [vm0] [info 2019/10/07 21:17:59.684 GMT <FederatingManager1> tid=0x332]
> Initializing region _monitoringRegion_172.17.0.10<v1>41001
> [vm0] [info 2019/10/07 21:17:59.694 GMT <FederatingManager1> tid=0x332]
> Initialization of region _monitoringRegion_172.17.0.10<v1>41001 completed
> {noformat}
> Server2 starts and _monitoringRegion_172.17.0.10<v2>41002 is created for it
> in the locator:
> {noformat}
> [vm2] [info 2019/10/07 21:18:00.379 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Starting DistributionManager 172.17.0.10(server-2:104)<v2>:41002.
> (took 361 ms)
> [vm0] [info 2019/10/07 21:18:00.379 GMT <FederatingManager2> tid=0x33c]
> Initializing region _monitoringRegion_172.17.0.10<v2>41002
> [vm0] [info 2019/10/07 21:18:00.424 GMT <FederatingManager2> tid=0x33c]
> Initialization of region _monitoringRegion_172.17.0.10<v2>41002 completed
> {noformat}
> Server2 shuts down:
> {noformat}
> [vm2] [info 2019/10/07 21:18:02.007 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Shutting down DistributionManager
> 172.17.0.10(server-2:104)<v2>:41002.
> [vm2] [info 2019/10/07 21:18:02.127 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Marking DistributionManager 172.17.0.10(server-2:104)<v2>:41002 as
> closed.
> {noformat}
> Server1 shuts down:
> {noformat}
> [vm1] [info 2019/10/07 21:18:02.189 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Shutting down DistributionManager
> 172.17.0.10(server-1:95)<v1>:41001.
> [vm1] [info 2019/10/07 21:18:02.309 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Marking DistributionManager 172.17.0.10(server-1:95)<v1>:41001 as
> closed.
> {noformat}
> The locator logs server1 leaving the distributed system:
> {noformat}
> [vm0] [info 2019/10/07 21:18:02.316 GMT <View Message Processor1> tid=0x331]
> Member at 172.17.0.10(server-1:95)<v1>:41001 gracefully left the distributed
> cache: departed membership view
> {noformat}
> Then, the locator starts to shut down:
> {noformat}
> [vm0] [info 2019/10/07 21:18:02.322 GMT <RMI TCP Connection(1)-172.17.0.10>
> tid=0x22] Stopping Distribution Locator on 7f8e3ffa7dc2[34127]
> {noformat}
> Then, the locator logs server2 leaving the distributed system:
> {noformat}
> [vm0] [info 2019/10/07 21:18:02.323 GMT <View Message Processor1> tid=0x331]
> Member at 172.17.0.10(server-2:104)<v2>:41002 gracefully left the distributed
> cache: departed membership view
> {noformat}
> As a results of server2 leaving the distributed system, a RemoveMemberTask is
> spun off to destroy the region:
> {noformat}
> [vm0] [fatal 2019/10/07 21:18:02.334 GMT <FederatingManager4> tid=0x34d]
> Uncaught exception in thread Thread[FederatingManager4,5,RMI Runtime]
> [vm0] org.apache.geode.cache.RegionDestroyedException:
> org.apache.geode.internal.cache.DistributedRegion[path='/_monitoringRegion_172.17.0.10<v2>41002';scope=DISTRIBUTED_NO_ACK';dataPolicy=REPLICATE]
> [vm0] at
> org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7293)
> [vm0] at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6157)
> [vm0] at
> org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
> [vm0] at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6128)
> [vm0] at
> org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2251)
> [vm0] at
> org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
> [vm0] at
> org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
> [vm0] at
> org.apache.geode.management.internal.FederatingManager.access$000(FederatingManager.java:67)
> [vm0] at
> org.apache.geode.management.internal.FederatingManager$RemoveMemberTask.run(FederatingManager.java:564)
> [vm0] at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [vm0] at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [vm0] at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}
> Immediately after that, the closing message is logged:
> {noformat}
> [vm0] [info 2019/10/07 21:18:02.342 GMT <Distribution Locator on
> 7f8e3ffa7dc2> tid=0x306] GemFireCache[id = 1017379193; isClosing = true;
> isShutDownAll = false; created = Mon Oct 07 21:17:55 GMT 2019; server =
> false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
> {noformat}
> The reason the localDestroyRegion failed is because the monitoringRegion had
> already been destroyed during cache close.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)