@bschuchardt I'm actually a little worried about merging this change in. 
Without this change, LynnG encountered a failure which I think might occur more 
if we merge this in.

In this run we are expecting ForcedDisconnects, but they don't occur until 
after the hang is declared.

The difference between similar hangs and this is the appearance of these 
"Recursive call to appender" messages:
```
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender 
ALERT
```

LynnG analysis:

*** Test is dropping the network
```
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 
2019/08/12 21:37:46.265 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] 
Dropping network connection from 
rs-FullRegression13040513a0i3large-hydra-client-46 to 
rs-FullRegression13040513a0i3large-hydra-client-1 and from 
rs-FullRegression13040513a0i3large-hydra-client-1 to 
rs-FullRegression13040513a0i3large-hydra-client-46
```
*** Here are the appender messages
```
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 5 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,556 Geode Failure Detection thread 5 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,558 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,559 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT
```
*** network is dropped
```
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 
2019/08/12 21:37:47.136 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] 
Dropped network connection from 
rs-FullRegression13040513a0i3large-hydra-client-46 to 
rs-FullRegression13040513a0i3large-hydra-client-1 and from 
rs-FullRegression13040513a0i3large-hydra-client-1 to 
rs-FullRegression13040513a0i3large-hydra-client-46
```
*** hang is declared
```
wan_bridgeNetworkPartition1-0812-213411/taskmaster_15326.log: [severe 
2019/08/12 21:40:44.626 PDT <master_15326> tid=0x1] Result for 
vm_12_thr_16_wan1Lose_host1_14084: TASK[0] 
splitBrain.NetworkPartitionTest.HydraTask_doEntryOperations: HANG a client 
exceeded max result wait sec: 180
```
*** now we see ForcedDisconnects
```
[fatal 2019/08/12 21:40:57.826 PDT <unicast 
receiver,rs-FullRegression13040513a0i3large-hydra-client-46-43284> tid=0x24] 
Membership service failure: Membership coordinator 
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared 
that a network partition has occurred
org.apache.geode.ForcedDisconnectException: Membership coordinator 
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared 
that a network partition has occurred
        at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.forceDisconnect(GMSMembershipManager.java:2506)
        at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1106)
        at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1481)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1328)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266)
        at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
        at org.jgroups.JChannel.up(JChannel.java:741)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
        at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
        at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
        at org.jgroups.protocols.TP.receive(TP.java:1714)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152)
        at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
        at java.lang.Thread.run(Thread.java:748)
```

[ Full content available at: https://github.com/apache/geode/pull/3904 ]
This message was relayed via gitbox.apache.org for 
notifications@geode.apache.org

Reply via email to