Kamilla Aslami created GEODE-9350:
-------------------------------------

             Summary: ShunnedMemberException after MemberJoinedEvent is 
triggered
                 Key: GEODE-9350
                 URL: https://issues.apache.org/jira/browse/GEODE-9350
             Project: Geode
          Issue Type: Bug
          Components: membership
    Affects Versions: 1.14.0, 1.15.0
            Reporter: Kamilla Aslami


While investigating GEODE-9070, we noticed a problem when a server tries to 
join a cluster, and soon after, membership fails with ShunnedMemberException:
{noformat}
org.apache.geode.distributed.internal.direct.ShunnedMemberException: Member is 
being shunned: ccf730fb2b62(161)<v2>:41002
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:469)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:283)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:190)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:550)
 at 
org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:354)
 at 
org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:296)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2068)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1983)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2028)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1085)
 at 
org.apache.geode.internal.cache.execute.StreamingFunctionOperation.getFunctionResultFrom(StreamingFunctionOperation.java:113)
 at 
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:149)
 at 
org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
 at 
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:397)
 at 
org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:402)
 at 
org.apache.geode.modules.util.BootstrappingFunction.bootstrapMember(BootstrappingFunction.java:170)
 at 
org.apache.geode.modules.util.BootstrappingFunction.memberJoined(BootstrappingFunction.java:240)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberJoinedEvent.handleEvent(ClusterDistributionManager.java:2498)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2451)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2440)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1406)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:109)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1438)
 at java.base/java.lang.Thread.run(Thread.java:834){noformat}

Further analysis showed that ShunnedMemberException is thrown because 
GMSMembership.memberExists() method returns false, which means that the member 
ccf730fb2b62(161)<v2>:41002 was not in the view. Looking at the stacktrace, we 
noticed that BootstrappingFunction.bootstrapMember() gets executed on 
MemberJoinedEvent, which is triggered by 
MembershipListener.newMemberConnected(). newMemberConnected() is called in 
GMSMembership.processView() before the new view is installed, so it's likely 
that the failure happens because BootstrappingFunction receives the event 
before the view was actually updated. Possible solution for this problem could 
be to change GMSMembership.processView() to call 
MembershipListener.newMemberConnected() only after the new view is installed.

This issue was introduced by the fix for GEODE-7245 which removed latestView 
lock from GMSMembership.memberExists(). Before GEODE-7245, this method was 
waiting until GMSMembership.processView() released the lock, so the problem 
described above could never happen. GEODE-7245 was back-ported to 1.14.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to