Erich Duda created ARTEMIS-1368:
-----------------------------------

             Summary: Artemis gets to state when it doesn't respond to producer
                 Key: ARTEMIS-1368
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1368
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.2.0, 1.5.5
            Reporter: Erich Duda


*Scenario*
* There are two servers configured in colocated replicated HA
* There are two producers. Each one sends messages on different server to 
InQueue.
* There is MDB on server 2 which resends messages from InQueue to OutQueue
* During the resending of messages, server 2 is restarted.
* After all messages are resent, receiver is connected to server 1 and it 
receives all messages.

*Expectation:* All messages sent by producers to InQueue are received by 
receiver from OutQueue.
*Reality:* After the restart of server 2, the server 1 gets into the state when 
it stops to respond to the producer. Producer sends a bulk of messages which 
are marked as duplicates by server, but the exception packet is not sent to 
producer. See below for more detailed description what is happening on the 
server.

*Customer impact:* Artemis may get into the state when it is not able to work 
properly. This can lead to unavailability of service.

This is *regression* against *7.0.z*.

The issue wasn't reported earlier, because the test failed due to JBEAP-7968 
before 7.1.0.ER3.

*Detail description of what happened on server*

I looked at the server traces to see what happened when server received session 
commit packet from producer. Based on the traces, the server behaved correctly 
and it even scheduled to send response packet with the duplication exception.

{code}
06:17:28,206 TRACE 
[org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] 
(Thread-6 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@37386953))
 ServerSessio
nPacketHandler::scheduling response::PACKET(ActiveMQExceptionMessage)[type=20, 
channelID=0, packetObject=ActiveMQExceptionMessage, exception= 
ActiveMQDuplicateIdException[errorType=DUPLICATE_ID_REJECTED message=
Duplicate message detected - message will not be routed. Message 
information:LargeServerMessage[messageID=7034,durable=true,userID=f3b4e359-7834-11e7-bd40-001b217d6dc3,priority=4,
 timestamp=Thu Aug 03 06:17:28 E
DT 2017,expiration=0, durable=true, 
address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=829bf1a6-7834-11e7-bd40-001b217d6dc3,count=1699,counter=1700,_AMQ_DUPL_ID=1bfb61bf-2eca-4f33-9723-5998dcd84ed515
01755380929,_AMQ_LARGE_SIZE=409615,color=RED]]@971671687]]
{code}

The problem is that after this event, I cannot find message which would say 
that the packet was sent. As you can see in code snippet below, the "scheduling 
response" says that it was registered IOCallback, which will send the packet 
once it is triggered.

{code:java}
private void sendResponse(final Packet confirmPacket,
                             final Packet response,
                             final boolean flush,
                             final boolean closeChannel) {
      if (logger.isTraceEnabled()) {
         logger.trace("ServerSessionPacketHandler::scheduling response::" + 
response);
      }

      storageManager.afterCompleteOperations(new IOCallback() {
         @Override
         public void onError(final int errorCode, final String errorMessage) {
            ActiveMQServerLogger.LOGGER.errorProcessingIOCallback(errorCode, 
errorMessage);

            ActiveMQExceptionMessage exceptionMessage = new 
ActiveMQExceptionMessage(ActiveMQExceptionType.createException(errorCode, 
errorMessage));

            doConfirmAndResponse(confirmPacket, exceptionMessage, flush, 
closeChannel);

            if (logger.isTraceEnabled()) {
               logger.trace("ServerSessionPacketHandler::exception response 
sent::" + exceptionMessage);
            }

         }

         @Override
         public void done() {
            if (logger.isTraceEnabled()) {
               logger.trace("ServerSessionPacketHandler::regular response 
sent::" + response);
            }

            doConfirmAndResponse(confirmPacket, response, flush, closeChannel);
         }
      });
   }
{code}

It is odd that the callback wasn't never triggered. This assumption confirms 
the warning printed at stopping of the server. In the OperationContext, there 
were still some callbacks which weren't triggered.

{code}
06:22:41,377 WARN  [org.apache.activemq.artemis.core.server] (ServerService 
Thread Pool -- 124) AMQ222105: Could not finish context execution in 10 
seconds: java.lang.Exception: warning
        at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.waitContextCompletion(ServerSessionImpl.java:1141)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.closeAllServerSessions(ActiveMQServerImpl.java:1103)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:888)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:793)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStop(ActiveMQServerImpl.java:688)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:681)
 [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.stop(JMSServerManagerImpl.java:433)
 [artemis-jms-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
        at 
org.wildfly.extension.messaging.activemq.jms.JMSService.doStop(JMSService.java:217)
 [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
        at 
org.wildfly.extension.messaging.activemq.jms.JMSService.access$100(JMSService.java:64)
 [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
        at 
org.wildfly.extension.messaging.activemq.jms.JMSService$2.run(JMSService.java:121)
 [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[rt.jar:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[rt.jar:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[rt.jar:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[rt.jar:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
        at org.jboss.threads.JBossThread.run(JBossThread.java:320) 
[jboss-threads-2.2.1.Final-redhat-1.jar:2.2.1.Final-redhat-1]
{code}

In the attachment you can find also thread dumps from the server. I didn't find 
there any deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to