[
https://issues.apache.org/jira/browse/GEODE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903154#comment-16903154
]
Dan Smith commented on GEODE-7055:
----------------------------------
We found this in a test of the tomcat session replication module. It turns out
the session module has a membership listener that sends this
BootstrappingFunction to *all* members as soon as the join, which is before
they send the startup message.
If a member does not have the BootstrappingFunction on the classpath, it will
try to send the above failure reply and hang. Our documentation is vague on
whether or not the locators should have this class on the classpath of the
locator, so some users may not have put it there. See
https://geode.apache.org/docs/guide/19/tools_modules/http_session_mgmt/tomcat_setting_up_the_module.html.
Before the changes in 00ed2f3c, we would only hang for 15 seconds, but now we
hang forever.
> Deadlock with StartupMessages if P2P error requiring a sendFailureReply
> ------------------------------------------------------------------------
>
> Key: GEODE-7055
> URL: https://issues.apache.org/jira/browse/GEODE-7055
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Ernest Burghardt
> Assignee: Dan Smith
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> An error/exception occurs on the P2P message thread, which requires a
> FailureReply be sent, but the StartupResponse message has not been recieved
> (on the P2P message thread) the failure reply will DEADLOCK on the call to
> org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs
> as the StartupOperation is already in a waitForReplies() for the
> StartupResponse
> {code:java}
> // below is an example of an Exception triggering the DEADLOCK
> {code}
>
> {code:java}
> [fatal 2019/08/05 22:47:06.462 UTC <P2P message reader for
> 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE
> 1.9.0) shared unordered uid=63 port=49194> tid=0x25] Error deserializing
> message
> java.lang.ClassNotFoundException:
> org.apache.geode.modules.util.BootstrappingFunction
> at
> org.apache.geode.internal.ClassPathLoader.forName(ClassPathLoader.java:180)
> at
> org.apache.geode.internal.InternalDataSerializer.getCachedClass(InternalDataSerializer.java:3274)
> at org.apache.geode.DataSerializer.readClass(DataSerializer.java:264)
> at
> org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2398)
> at
> org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2673)
> at
> org.apache.geode.DataSerializer.readObject(DataSerializer.java:2968)
> at
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.fromData(MemberFunctionStreamingMessage.java:277)
> at
> org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372)
> at
> org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:997)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2516)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2528)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3111)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "P2P message reader for
> 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE
> 1.9.0) shared unordered uid=63 port=49194" #37 daemon prio=10 os_prio=0
> tid=0x00007f4a108bb800 nid=0x2a in Object.wait() [0x00007f4a0dca7000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000006d39c4538> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:502)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs(ClusterDistributionManager.java:1212)
> - locked <0x00000006d39c4538> (a java.lang.Object)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1528)
> at
> org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:113)
> at
> org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:86)
> at
> org.apache.geode.internal.tcp.Connection.sendFailureReply(Connection.java:1954)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3162)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)