[ https://issues.apache.org/jira/browse/GEODE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125352#comment-17125352 ]
Kirk Lund edited comment on GEODE-8131 at 6/4/20, 5:32 PM: ----------------------------------------------------------- The following three commits are necessary to fix GEODE-8131: {noformat} commit 36cb801582b4168a319586ebd3a9cdeee4ec2d5c Author: Dan Smith <upthewatersp...@apache.org> Date: Mon Dec 30 14:15:51 2019 -0800 GEODE-7623: Preventing recursion in alert generation After the change to send alerts in a executor, there was a possibiltiy for sending alerts to generate new alerts, resulting an in infinite loop. Setting the AlertingAction threadlocal in the executor thread that is sending alerts, to prevent recursion. Adding a test that we do not generate recursive alert messages. {noformat} {noformat} commit 8944e897eeef3e9fcb79d97959c431595f2d9a35 Author: Dan Smith <upthewatersp...@apache.org> Date: Tue Dec 24 11:00:12 2019 -0800 GEODE-7545: Remove dependency on AltertingAction from membership The only place where JGroupsMessenger called AlertingAction was when sending a message. The only message that could be sent while that thread local is true is AlertListenerMessage. Setting isHighPriority to true on AlertListenerMessage instead. JGroupsMessenger is the only place that flag is checked, so this should have the same effect as testing the thread local. {noformat} {noformat} commit dffcb9446aef09c7bf6e626121f4d2ec5c74586f Author: Kirk Lund <kl...@apache.org> Date: Tue Sep 17 10:39:05 2019 -0700 GEODE-7152: Send alert messages using executor AlertAppender uses a ThreadLocal to prevent recursive calls from actually doing anything. However, a recent upgrade to our log4j dependencies seems to have changed the behavior such that log4j refuses to invoke doAppend if the thread is currently handling a sendAlert initiated from doAppend. To fix this bug, sendAlert must be async. {noformat} {noformat} commit dffcb9446aef09c7bf6e626121f4d2ec5c74586f Author: Kirk Lund <kl...@apache.org> Date: Tue Sep 17 10:39:05 2019 -0700 GEODE-7152: Send alert messages using executor AlertAppender uses a ThreadLocal to prevent recursive calls from actually doing anything. However, a recent upgrade to our log4j dependencies seems to have changed the behavior such that log4j refuses to invoke doAppend if the thread is currently handling a sendAlert initiated from doAppend. To fix this bug, sendAlert must be async. {noformat} was (Author: klund): The following commit should fix GEODE-8131. As far as I can tell, this commit has not been included in any releases. {noformat} commit dffcb9446aef09c7bf6e626121f4d2ec5c74586f Author: Kirk Lund <kl...@apache.org> Date: Tue Sep 17 10:39:05 2019 -0700 GEODE-7152: Send alert messages using executor AlertAppender uses a ThreadLocal to prevent recursive calls from actually doing anything. However, a recent upgrade to our log4j dependencies seems to have changed the behavior such that log4j refuses to invoke doAppend if the thread is currently handling a sendAlert initiated from doAppend. To fix this bug, sendAlert must be async. {noformat} > alert service hangs, blocking cache operations > ---------------------------------------------- > > Key: GEODE-8131 > URL: https://issues.apache.org/jira/browse/GEODE-8131 > Project: Geode > Issue Type: Bug > Components: logging > Affects Versions: 1.0.0-incubating, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, > 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0 > Reporter: Bruce J Schuchardt > Assignee: Kirk Lund > Priority: Major > Labels: GeodeOperationAPI > Fix For: 1.12.0 > > > This v1.8 TcpConduit reader thread was blocked in a production system. It > had experienced a deserialization error and was trying to log the exception. > A Manager was present in the cluster and had registered as an alert listener. > Another thread was blocked sending something on the shared/unordered > connection that this alert should be sent on. This persisted for over 6 > hours and we never saw the serialization exception in the log file. > Consequently we had to recommend setting the alert level to None and have > them run into the serialization problem again. > This is a serious flaw in the alerting system and it's caused us grief many > times. The alerting system should not block other threads. Maybe a > background thread could consume and transmit alerts to alert-listeners? > > {noformat} > "P2P message reader for 10.236.28.120(servername-removed)<v491>:56152 shared > unordered uid=9 port=41204" tid=0xd49 (in native) java.lang.Thread.State: > RUNNABLE at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at > sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at > sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at > sun.nio.ch.IOUtil.write(IOUtil.java:51) at > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) - locked > java.lang.Object@24528b9b at > org.apache.geode.internal.tcp.Connection.nioWriteFully(Connection.java:3291) > - locked java.lang.Object@42a1a79b at > org.apache.geode.internal.tcp.Connection.sendPreserialized(Connection.java:2527) > at org.apache.geode.internal.tcp.MsgStreamer.realFlush(MsgStreamer.java:319) > at > org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:244) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:393) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:250) > at > org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:615) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1717) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.send(GMSMembershipManager.java:1898) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2878) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2798) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2837) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1531) > at > org.apache.geode.internal.alerting.AlertMessaging.sendAlert(AlertMessaging.java:75) > at > org.apache.geode.internal.logging.log4j.AlertAppender.sendAlertMessage(AlertAppender.java:188) > at > org.apache.geode.internal.logging.log4j.AlertAppender.doAppend(AlertAppender.java:163) > at > org.apache.geode.internal.logging.log4j.AlertAppender.lambda$append$0(AlertAppender.java:159) > at > org.apache.geode.internal.logging.log4j.AlertAppender$$Lambda$168/1102181662.run(Unknown > Source) at > org.apache.geode.internal.alerting.AlertingAction.execute(AlertingAction.java:29) > at > org.apache.geode.internal.logging.log4j.AlertAppender.append(AlertAppender.java:159) > at > org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) > at > org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:464) > at > org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:448) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:431) > at > org.apache.logging.log4j.core.config.LoggerConfig.logParent(LoggerConfig.java:455) > at > org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:450) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:431) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:406) > at > org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63) > at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146) at > org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:217) > at > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2170) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2125) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2108) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2002) > at > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1974) > at > org.apache.logging.log4j.spi.AbstractLogger.fatal(AbstractLogger.java:1054) > at > org.apache.geode.internal.tcp.Connection.processNIOBuffer(Connection.java:3610) > at > org.apache.geode.internal.tcp.Connection.runNioReader(Connection.java:1824) > at org.apache.geode.internal.tcp.Connection.run(Connection.java:1686) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)