[
https://issues.apache.org/jira/browse/GEODE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903185#comment-16903185
]
ASF subversion and git services commented on GEODE-7055:
--------------------------------------------------------
Commit 6dde1cdeffcdd169f26242c5a9ffcc2b40374e0b in geode's branch
refs/heads/release/1.10.0 from Dan Smith
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6dde1cd ]
GEODE-7055: Don't send failure replies from a P2P reader thread
We were hitting a deadlock during startup if a P2P reader thread tried
to send a failure reply - it would block waiting for startup to finish,
but startup would not finish until the P2P reader thread could read a
startup response.
Send the failure reply in a separate thread, to make sure we always
unblock the P2P reader thread to read new messages.
(cherry picked from commit 1438b56bf7ef44e758bb4fc157dfca2cff4e2c99)
> Deadlock with StartupMessages if P2P error requiring a sendFailureReply
> ------------------------------------------------------------------------
>
> Key: GEODE-7055
> URL: https://issues.apache.org/jira/browse/GEODE-7055
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Ernest Burghardt
> Assignee: Dan Smith
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> An error/exception occurs on the P2P message thread, which requires a
> FailureReply be sent, but the StartupResponse message has not been recieved
> (on the P2P message thread) the failure reply will DEADLOCK on the call to
> org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs
> as the StartupOperation is already in a waitForReplies() for the
> StartupResponse
> {code:java}
> // below is an example of an Exception triggering the DEADLOCK
> {code}
>
> {code:java}
> [fatal 2019/08/05 22:47:06.462 UTC <P2P message reader for
> 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE
> 1.9.0) shared unordered uid=63 port=49194> tid=0x25] Error deserializing
> message
> java.lang.ClassNotFoundException:
> org.apache.geode.modules.util.BootstrappingFunction
> at
> org.apache.geode.internal.ClassPathLoader.forName(ClassPathLoader.java:180)
> at
> org.apache.geode.internal.InternalDataSerializer.getCachedClass(InternalDataSerializer.java:3274)
> at org.apache.geode.DataSerializer.readClass(DataSerializer.java:264)
> at
> org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2398)
> at
> org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2673)
> at
> org.apache.geode.DataSerializer.readObject(DataSerializer.java:2968)
> at
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.fromData(MemberFunctionStreamingMessage.java:277)
> at
> org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372)
> at
> org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:997)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2516)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2528)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3111)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "P2P message reader for
> 10.0.8.10(cacheserver-28663bad-c0b0-41f7-b723-5a2425fa54ff:1)<v5>:56152(version:GEODE
> 1.9.0) shared unordered uid=63 port=49194" #37 daemon prio=10 os_prio=0
> tid=0x00007f4a108bb800 nid=0x2a in Object.wait() [0x00007f4a0dca7000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000006d39c4538> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:502)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.waitUntilReadyToSendMsgs(ClusterDistributionManager.java:1212)
> - locked <0x00000006d39c4538> (a java.lang.Object)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1528)
> at
> org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:113)
> at
> org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:86)
> at
> org.apache.geode.internal.tcp.Connection.sendFailureReply(Connection.java:1954)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3162)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2920)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1745)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1577)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)