[
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Bennion updated ARTEMIS-3809:
-----------------------------------
Summary: LargeMessageControllerImpl hangs the message consume (was:
LargeMessageConsumerImpl hangs the message consume)
> LargeMessageControllerImpl hangs the message consume
> ----------------------------------------------------
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB (heap 3.391GB, non-heap 0.123GB)
> Reporter: David Bennion
> Priority: Major
> Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this
> happens on 2.21.0 and I can see the code change in
> LargeMessageControllerImpl.
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens. I would
> anticipate that most of the messages are smaller than 100K, but clearly some
> of them must exceed. After some number of messages, a particular consumer
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able
> to identify that the system is stuck in an Object.wait() for a notify that
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
> - waiting on <0x43523a75> (a
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
> - locked <0x43523a75> (a
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
> at [email protected]/java.lang.Object.wait(Native Method)
> at
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
> at
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
> at
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
> at
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
> at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>
> The app can run either as a single node using the InVM transporter or as a
> cluster using the TCP. To my knowledge, I have only seen this issue occur on
> the InVM.
> I am not expert in this code, but I can tell from the call stack that 0 must
> be the value of timeWait passed into waitCompletion(). But from what I can
> discern of the code changes in 2.21.0, it should be adjusting the
> readTimeout to the timeout of the message (I think?) such that it causes the
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only
> disk activity for messages should be related to large messages(?).
> On a machine and context where this was consistently happening, I adjusted
> the min-large-message-size upwards and the problem went away. This makes
> sense for my application, but ultimately if a message goes across the
> threshold to become large it appears to hang the consumer indefinitely.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)