[ 
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494856#comment-17494856
 ] 

Barrett Oglesby edited comment on GEODE-9910 at 2/22/22, 6:35 PM:
------------------------------------------------------------------

With a modification to the product to simulate a failed JoinRequestMessage, I 
can reproduce this issue.
h3. Test
 # Start server 1 (becomes coordinator)
 # Start server 2
 # Play dead server 1
 # The servers disconnect from each other
 # Server 2 disconnects from the distributed system since it doesn't have quorum

When server 2 reconnects, it:
 - establishes quorum
 - starts the locator
 - is unable to join the distributed system (due to the modification I made)
 - attempts to reconnect again
 - fails because the locator is already started


was (Author: barry.oglesby):
With a modification to the product to simulate a failed JoinRequestMessage, I 
can reproduce this issue.

h3. Test
# Start server 1 (becomes coordinator)
# Start server 2
# Play dead server 2
# The servers disconnect from each other
# Server 2 disconnects from the distributed system since it doesn't have quorum

When server 2 reconnects, it:
 - establishes quorum
 - starts the locator
 - is unable to join the distributed system (due to the modification I made)
 - attempts to reconnect again
 - fails because the locator is already started

> Failure to auto-reconnect upon network partition
> ------------------------------------------------
>
>                 Key: GEODE-9910
>                 URL: https://issues.apache.org/jira/browse/GEODE-9910
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>            Reporter: Surya Mudundi
>            Assignee: Barrett Oglesby
>            Priority: Major
>              Labels: GeodeOperationAPI, blocks-1.15.0​, needsTriage
>         Attachments: geode-logs.zip
>
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1 
> experienced network outage for couple of minutes and when node-1 recovered 
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Attempting to reconnect to the distributed system.  This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO  
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread] 
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service 
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO  [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] [] 
> initializing InternalDataSerializer with 0 services
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] performing a quorum check to see if location services can be started early
> INFO  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Quorum check passed - allowing location services to start early
> WARN  
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
>  [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one 
> already exists in this JVM.
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
>  ~[geode-core-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
>  ~[geode-membership-1.14.0.jar:?]
>         at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
>  ~[geode-membership-1.14.0.jar:?]
>         at java.lang.Thread.run(Thread.java:829) [?:?]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to