[ https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530988#comment-17530988 ]
David Bennion commented on ARTEMIS-3809: ---------------------------------------- I have logs and I was able to identify the last time the message consumer gets called: {code:java} 05/02/2022 15:19:10.440 TRACE (global-threads)) [ClientConsumerImpl ] org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl@4e3abde6{consumerContext=ActiveMQConsumerContext{id=0}, queueName=analyze}::Calling handler.onMessage {code} The blocked consumer is on a queue called 'analyze'. We have another queue called 'Stats' that I believe is still pushing messages. {noformat} 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection ] InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending inVM packet 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [motingConnectionImpl] RemotingConnectionID=1e5a49e7-ca54-11ec-99a3-0050568ed7da handling packet SessionIndividualAcknowledgeMessage[type=81, channelID=23, responseAsync=false, requiresResponse=false, correlationID=-1, consumerID=0, messageID=34369871410, requiresResponse=false] 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection ] InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet sent done 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection ] InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending inVM packet 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [motingConnectionImpl] RemotingConnectionID=1e5a49e7-ca54-11ec-99a3-0050568ed7da handling packet SessionSendMessage_V2[type=71, channelID=14, responseAsync=true, requiresResponse=false, correlationID=0, message=CoreMessage[messageID=0,durable=false,userID=null,priority=4, timestamp=Mon May 02 15:19:10 CDT 2022,expiration=Mon May 02 15:19:12 CDT 2022, durable=false, address=Stats,size=1552,properties=TypedProperties[...SNIPPED...]]@1327958296, requiresResponse=false, correlationID=0, requiresResponse=false] 05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection ] InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet sent done 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [SessionPacketHandler] ServerSessionPacketHandler::handlePacket,SessionIndividualAcknowledgeMessage[type=81, channelID=23, responseAsync=false, requiresResponse=false, correlationID=-1, consumerID=0, messageID=34369871410, requiresResponse=false] 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl ] individualACK messageID=34369871410 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl ] individualACK starting new TX 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl ] ACKing ref Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, durable=false, address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345 on tx= TransactionImpl [xid=null, txID=34369871912, xid=null, state=ACTIVE, createTime=1651522750440(Mon May 02 15:19:10 CDT 2022), timeoutSeconds=-1, nr operations = 0]@1df2cdae, consumer=ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=analyze, queue=QueueImpl[name=analyze, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=0.0.0.0], temp=false]@1199309c, filter=null, name=analyze, clusterName=analyze5aa9778c-98de-11ec-b671-0050568ed7da]] 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [QueueImpl ] QueueImpl[name=analyze, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=0.0.0.0], temp=false]@1199309c acknowledge tx=true ref=Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, durable=false, address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345, reason=NORMAL, consumer=ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=analyze, queue=QueueImpl[name=analyze, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=0.0.0.0], temp=false]@1199309c, filter=null, name=analyze, clusterName=analyze5aa9778c-98de-11ec-b671-0050568ed7da]] 05/02/2022 15:19:10.440 INFO (Impl$6@f15a0d8)) [message ] AMQ601502: User guest(guest)@invm:0 is acknowledging a message from analyze: CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, durable=false, address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345 05/02/2022 15:19:10.440 DEBUG (Impl$6@f15a0d8)) [impl ] AMQ843014: acknowledged message: Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, durable=false, address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345, with ackReason: NORMAL 05/02/2022 15:19:10.440 INFO (Impl$6@f15a0d8)) [impl ] AMQ841014: acknowledged message ID: 34369871410, messageRef sessionID: 2767a430-ca54-11ec-99a3-0050568ed7da, with messageRef consumerID: 0, messageRef QueueName: analyze, with ackReason: NORMAL 05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [TransactionImpl ] TransactionImpl::commit::TransactionImpl [xid=null, txID=34369871912, xid=null, state=ACTIVE, createTime=1651522750440(Mon May 02 15:19:10 CDT 2022), timeoutSeconds=-1, nr operations = 1]@1df2cdae 05/02/2022 15:19:10.440 DEBUG (Impl$6@f15a0d8)) [QueueMessageMetrics ] QueuePendingMessageMetrics[queue=analyze, name=delivering] decrement messageCount to 57: Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, durable=false, address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345 {noformat} This is one of several rotated logs, but catches the moment of the last message delivery and several message deliveries prior. My log file is 50MB. I am not certain what is relevant. > LargeMessageControllerImpl hangs the message consume > ---------------------------------------------------- > > Key: ARTEMIS-3809 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3809 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.21.0 > Environment: OS: Windows Server 2019 > JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12 > Max Memory (-Xmx): 6GB > Allocated to JVM: 4.168GB > Currently in use: 3.398GB (heap 3.391GB, non-heap 0.123GB) > Reporter: David Bennion > Priority: Major > Labels: test-stability > > I wondered if this might be a recurrence of issue ARTEMIS-2293 but this > happens on 2.21.0 and I can see the code change in > LargeMessageControllerImpl. > Using the default min-large-message-size of 100K. (defaults) > Many messages are passing through the broker when this happens. I would > anticipate that most of the messages are smaller than 100K, but clearly some > of them must exceed. After some number of messages, a particular consumer > ceases to consume messages. > After the system became "hung" I was able to get a stack trace and I was able > to identify that the system is stuck in an Object.wait() for a notify that > appears to never come. > Here is the trace I was able to capture: > {code:java} > Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING > - waiting on <0x43523a75> (a > org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl) > - locked <0x43523a75> (a > org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl) > at java.base@17.0.1/java.lang.Object.wait(Native Method) > at > org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294) > at > org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268) > at > org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157) > at > org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89) > at mypackage.MessageListener.handleMessage(MessageListener.java:46) > {code} > > The app can run either as a single node using the InVM transporter or as a > cluster using the TCP. To my knowledge, I have only seen this issue occur on > the InVM. > I am not expert in this code, but I can tell from the call stack that 0 must > be the value of timeWait passed into waitCompletion(). But from what I can > discern of the code changes in 2.21.0, it should be adjusting the > readTimeout to the timeout of the message (I think?) such that it causes the > read to eventually give up rather than remaining blocked forever. > We have persistenceEnabled = false, which leads me to believe that the only > disk activity for messages should be related to large messages(?). > On a machine and context where this was consistently happening, I adjusted > the min-large-message-size upwards and the problem went away. This makes > sense for my application, but ultimately if a message goes across the > threshold to become large it appears to hang the consumer indefinitely. -- This message was sent by Atlassian Jira (v8.20.7#820007)