@bschuchardt I'm actually a little worried about merging this change in. Without this change, LynnG encountered a failure which I think might occur more if we merge this in.
In this run we are expecting ForcedDisconnects, but they don't occur until after the hang is declared. The difference between similar hangs and this is the appearance of these "Recursive call to appender" messages: ``` wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender ALERT ``` LynnG analysis: *** Test is dropping the network ``` wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 2019/08/12 21:37:46.265 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] Dropping network connection from rs-FullRegression13040513a0i3large-hydra-client-46 to rs-FullRegression13040513a0i3large-hydra-client-1 and from rs-FullRegression13040513a0i3large-hydra-client-1 to rs-FullRegression13040513a0i3large-hydra-client-46 ``` *** Here are the appender messages ``` wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender ALERT wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 5 ERROR Recursive call to appender ALERT wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,556 Geode Failure Detection thread 5 ERROR Recursive call to appender ALERT wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,558 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,559 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT ``` *** network is dropped ``` wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 2019/08/12 21:37:47.136 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] Dropped network connection from rs-FullRegression13040513a0i3large-hydra-client-46 to rs-FullRegression13040513a0i3large-hydra-client-1 and from rs-FullRegression13040513a0i3large-hydra-client-1 to rs-FullRegression13040513a0i3large-hydra-client-46 ``` *** hang is declared ``` wan_bridgeNetworkPartition1-0812-213411/taskmaster_15326.log: [severe 2019/08/12 21:40:44.626 PDT <master_15326> tid=0x1] Result for vm_12_thr_16_wan1Lose_host1_14084: TASK[0] splitBrain.NetworkPartitionTest.HydraTask_doEntryOperations: HANG a client exceeded max result wait sec: 180 ``` *** now we see ForcedDisconnects ``` [fatal 2019/08/12 21:40:57.826 PDT <unicast receiver,rs-FullRegression13040513a0i3large-hydra-client-46-43284> tid=0x24] Membership service failure: Membership coordinator 10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared that a network partition has occurred org.apache.geode.ForcedDisconnectException: Membership coordinator 10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared that a network partition has occurred at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.forceDisconnect(GMSMembershipManager.java:2506) at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1106) at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1481) at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1328) at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266) at org.jgroups.JChannel.invokeCallback(JChannel.java:816) at org.jgroups.JChannel.up(JChannel.java:741) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030) at org.jgroups.protocols.FRAG2.up(FRAG2.java:165) at org.jgroups.protocols.FlowControl.up(FlowControl.java:390) at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077) at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792) at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433) at org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73) at org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72) at org.jgroups.protocols.TP.passMessageUp(TP.java:1658) at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876) at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10) at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789) at org.jgroups.protocols.TP.receive(TP.java:1714) at org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152) at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701) at java.lang.Thread.run(Thread.java:748) ``` [ Full content available at: https://github.com/apache/geode/pull/3904 ] This message was relayed via gitbox.apache.org for notifications@geode.apache.org