[
https://issues.apache.org/jira/browse/ARTEMIS-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram resolved ARTEMIS-1368.
-------------------------------------
Resolution: Fixed
Fix Version/s: 2.3.0
> Artemis gets to state when it doesn't respond to producer
> ---------------------------------------------------------
>
> Key: ARTEMIS-1368
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1368
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 1.5.5, 2.2.0
> Reporter: Erich Duda
> Fix For: 2.3.0
>
>
> *Scenario*
> * There are two servers configured in colocated replicated HA
> * There are two producers. Each one sends messages on different server to
> InQueue.
> * There is MDB on server 2 which resends messages from InQueue to OutQueue
> * During the resending of messages, server 2 is restarted.
> * After all messages are resent, receiver is connected to server 1 and it
> receives all messages.
> *Expectation:* All messages sent by producers to InQueue are received by
> receiver from OutQueue.
> *Reality:* After the restart of server 2, the server 1 gets into the state
> when it stops to respond to the producer. Producer sends a bulk of messages
> which are marked as duplicates by server, but the exception packet is not
> sent to producer. See below for more detailed description what is happening
> on the server.
> *Customer impact:* Artemis may get into the state when it is not able to work
> properly. This can lead to unavailability of service.
> This is *regression* against *7.0.z*.
> The issue wasn't reported earlier, because the test failed due to JBEAP-7968
> before 7.1.0.ER3.
> *Detail description of what happened on server*
> I looked at the server traces to see what happened when server received
> session commit packet from producer. Based on the traces, the server behaved
> correctly and it even scheduled to send response packet with the duplication
> exception.
> {code}
> 06:17:28,206 TRACE
> [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler]
> (Thread-6
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@37386953))
> ServerSessio
> nPacketHandler::scheduling
> response::PACKET(ActiveMQExceptionMessage)[type=20, channelID=0,
> packetObject=ActiveMQExceptionMessage, exception=
> ActiveMQDuplicateIdException[errorType=DUPLICATE_ID_REJECTED message=
> Duplicate message detected - message will not be routed. Message
> information:LargeServerMessage[messageID=7034,durable=true,userID=f3b4e359-7834-11e7-bd40-001b217d6dc3,priority=4,
> timestamp=Thu Aug 03 06:17:28 E
> DT 2017,expiration=0, durable=true,
> address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=829bf1a6-7834-11e7-bd40-001b217d6dc3,count=1699,counter=1700,_AMQ_DUPL_ID=1bfb61bf-2eca-4f33-9723-5998dcd84ed515
> 01755380929,_AMQ_LARGE_SIZE=409615,color=RED]]@971671687]]
> {code}
> The problem is that after this event, I cannot find message which would say
> that the packet was sent. As you can see in code snippet below, the
> "scheduling response" says that it was registered IOCallback, which will send
> the packet once it is triggered.
> {code:java}
> private void sendResponse(final Packet confirmPacket,
> final Packet response,
> final boolean flush,
> final boolean closeChannel) {
> if (logger.isTraceEnabled()) {
> logger.trace("ServerSessionPacketHandler::scheduling response::" +
> response);
> }
> storageManager.afterCompleteOperations(new IOCallback() {
> @Override
> public void onError(final int errorCode, final String errorMessage) {
> ActiveMQServerLogger.LOGGER.errorProcessingIOCallback(errorCode,
> errorMessage);
> ActiveMQExceptionMessage exceptionMessage = new
> ActiveMQExceptionMessage(ActiveMQExceptionType.createException(errorCode,
> errorMessage));
> doConfirmAndResponse(confirmPacket, exceptionMessage, flush,
> closeChannel);
> if (logger.isTraceEnabled()) {
> logger.trace("ServerSessionPacketHandler::exception response
> sent::" + exceptionMessage);
> }
> }
> @Override
> public void done() {
> if (logger.isTraceEnabled()) {
> logger.trace("ServerSessionPacketHandler::regular response
> sent::" + response);
> }
> doConfirmAndResponse(confirmPacket, response, flush,
> closeChannel);
> }
> });
> }
> {code}
> It is odd that the callback wasn't never triggered. This assumption confirms
> the warning printed at stopping of the server. In the OperationContext, there
> were still some callbacks which weren't triggered.
> {code}
> 06:22:41,377 WARN [org.apache.activemq.artemis.core.server] (ServerService
> Thread Pool -- 124) AMQ222105: Could not finish context execution in 10
> seconds: java.lang.Exception: warning
> at
> org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.waitContextCompletion(ServerSessionImpl.java:1141)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.closeAllServerSessions(ActiveMQServerImpl.java:1103)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:888)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:793)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStop(ActiveMQServerImpl.java:688)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:681)
> [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.stop(JMSServerManagerImpl.java:433)
> [artemis-jms-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
> at
> org.wildfly.extension.messaging.activemq.jms.JMSService.doStop(JMSService.java:217)
> [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
> at
> org.wildfly.extension.messaging.activemq.jms.JMSService.access$100(JMSService.java:64)
> [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
> at
> org.wildfly.extension.messaging.activemq.jms.JMSService$2.run(JMSService.java:121)
> [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [rt.jar:1.8.0_131]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [rt.jar:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [rt.jar:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [rt.jar:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
> at org.jboss.threads.JBossThread.run(JBossThread.java:320)
> [jboss-threads-2.2.1.Final-redhat-1.jar:2.2.1.Final-redhat-1]
> {code}
> In the attachment you can find also thread dumps from the server. I didn't
> find there any deadlocks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)