Kirk Lund created GEODE-7152:
--------------------------------

             Summary: AlertAppender recursive call causes ForcedDisconnect to 
hang
                 Key: GEODE-7152
                 URL: https://issues.apache.org/jira/browse/GEODE-7152
             Project: Geode
          Issue Type: Bug
          Components: management
            Reporter: Kirk Lund


AlertAppender uses a ThreadLocal to prevent recursive calls from actually doing 
anything. However, a recent upgrade to our log4j dependencies seems to have 
changed the behavior such that log4j refuses to invoke {{doAppend}} if the 
thread is currently handling a {{sendAlert}} initiated from {{doAppend}}. To 
fix this bug, we will need to remove the use of our ThreadLocal from 
{{doAppend}} and change {{sendAlert}} to be asynchronous.

In this run we are expecting ForcedDisconnects, but they don't occur until 
after the hang is declared.

The appearance of these "Recursive call to appender" messages is new:
{noformat}
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender 
ALERT
{noformat}
*** Test is dropping the network
{noformat}
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 
2019/08/12 21:37:46.265 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] 
Dropping network connection from 
rs-FullRegression13040513a0i3large-hydra-client-46 to 
rs-FullRegression13040513a0i3large-hydra-client-1 and from 
rs-FullRegression13040513a0i3large-hydra-client-1 to 
rs-FullRegression13040513a0i3large-hydra-client-46
{noformat}
*** Here are the appender messages
{noformat}
wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,554 Geode Failure Detection thread 5 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,556 Geode Failure Detection thread 5 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,558 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT

wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 
21:37:47,559 Geode Failure Detection thread 4 ERROR Recursive call to appender 
ALERT
{noformat}
*** network is dropped
{noformat}
wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 
2019/08/12 21:37:47.136 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] 
Dropped network connection from 
rs-FullRegression13040513a0i3large-hydra-client-46 to 
rs-FullRegression13040513a0i3large-hydra-client-1 and from 
rs-FullRegression13040513a0i3large-hydra-client-1 to 
rs-FullRegression13040513a0i3large-hydra-client-46
{noformat}
*** hang is declared
{noformat}
wan_bridgeNetworkPartition1-0812-213411/taskmaster_15326.log: [severe 
2019/08/12 21:40:44.626 PDT <master_15326> tid=0x1] Result for 
vm_12_thr_16_wan1Lose_host1_14084: TASK[0] 
splitBrain.NetworkPartitionTest.HydraTask_doEntryOperations: HANG a client 
exceeded max result wait sec: 180
{noformat}
*** now we see ForcedDisconnects
{noformat}
[fatal 2019/08/12 21:40:57.826 PDT <unicast 
receiver,rs-FullRegression13040513a0i3large-hydra-client-46-43284> tid=0x24] 
Membership service failure: Membership coordinator 
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared 
that a network partition has occurred
org.apache.geode.ForcedDisconnectException: Membership coordinator 
10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared 
that a network partition has occurred
        at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.forceDisconnect(GMSMembershipManager.java:2506)
        at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1106)
        at 
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1481)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1328)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266)
        at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
        at org.jgroups.JChannel.up(JChannel.java:741)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
        at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
        at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
        at org.jgroups.protocols.TP.receive(TP.java:1714)
        at 
org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152)
        at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
        at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to