@bschuchardt I'm actually a little worried about merging this change in.
Without this change, LynnG encountered a failure which I think might occur more
if we merge this in.
In this run we are expecting ForcedDisconnects, but they don't occur until
after the hang is declared.
The difference between similar hangs and this is the appearance of these
"Recursive call to appender" messages:
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender
ALERT
*** Test is dropping the network
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info
2019/08/12 21:37:46.265 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94]
Dropping network connection from
rs-FullRegression13040513a0i3large-hydra-client-46 to
rs-FullRegression13040513a0i3large-hydra-client-1 and from
rs-FullRegression13040513a0i3large-hydra-client-1 to
rs-FullRegression13040513a0i3large-hydra-client-46
*** Here are the appender messages
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender
ALERT
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,554 Geode Failure Detection thread 4 ERROR Recursive call to appender
ALERT
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,554 Geode Failure Detection thread 5 ERROR Recursive call to appender
ALERT
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,556 Geode Failure Detection thread 5 ERROR Recursive call to appender
ALERT
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,558 Geode Failure Detection thread 4 ERROR Recursive call to appender
ALERT
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12
21:37:47,559 Geode Failure Detection thread 4 ERROR Recursive call to appender
ALERT
*** network is dropped
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info
2019/08/12 21:37:47.136 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94]
Dropped network connection from
rs-FullRegression13040513a0i3large-hydra-client-46 to
rs-FullRegression13040513a0i3large-hydra-client-1 and from
rs-FullRegression13040513a0i3large-hydra-client-1 to
rs-FullRegression13040513a0i3large-hydra-client-46
*** hang is declared
wan_bridgeNetworkPartition1-0812-213411/taskmaster_15326.log: [severe
2019/08/12 21:40:44.626 PDT <master_15326> tid=0x1] Result for
vm_12_thr_16_wan1Lose_host1_14084: TASK[0]
splitBrain.NetworkPartitionTest.HydraTask_doEntryOperations: HANG a client
exceeded max result wait sec: 180
*** now we see ForcedDisconnects
```
[fatal 2019/08/12 21:40:57.826 PDT <unicast
receiver,rs-FullRegression13040513a0i3large-hydra-client-46-43284> tid=0x24]
Membership service failure: Membership coordinator
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared
that a network partition has occurred
org.apache.geode.ForcedDisconnectException: Membership coordinator
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared
that a network partition has occurred
at
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.forceDisconnect(GMSMembershipManager.java:2506)
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1106)
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1481)
at
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1328)
at
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266)
at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
at org.jgroups.JChannel.up(JChannel.java:741)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
at
org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73)
at
org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
at org.jgroups.protocols.TP.receive(TP.java:1714)
at
org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152)
at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
at java.lang.Thread.run(Thread.java:748)
```
[ Full content available at: https://github.com/apache/geode/pull/3904 ]
This message was relayed via gitbox.apache.org for
[email protected]