[
https://issues.apache.org/jira/browse/AMQ-9855?focusedWorklogId=1005962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1005962
]
ASF GitHub Bot logged work on AMQ-9855:
---------------------------------------
Author: ASF GitHub Bot
Created on: 19/Feb/26 00:14
Start Date: 19/Feb/26 00:14
Worklog Time Spent: 10m
Work Description: cshannon commented on PR #1659:
URL: https://github.com/apache/activemq/pull/1659#issuecomment-3923949758
> To clarify the background task involvement: in a standard serial flow, the
Destination handles the guards correctly. However, in my
ActiveMQTextMessageStressTest, the race occurs during high-concurrency
scenarios where multiple threads interact with the message command.
>
> The failure specifically occurs here: Caused by: java.lang.AssertionError:
Text should never be null during stress at line 138
>
> This trace confirms that even though the message was produced with text,
the unmarshalled state was cleared out from under the consumer thread before it
could be read.
>
> Because clearUnmarshalledState() is public, it can be invoked by internal
broker components (like Advisory dispatchers or NIO transport threads) to
reduce memory footprint. Since the current implementation doesn't check if
content is actually populated before nulling the text, it creates this
'double-null' state.
>
> Hardening the command itself makes it 'safe by default,' protecting data
integrity regardless of which internal broker component calls the cleanup
So taking a step back, at this point I'm not sure there's a real problem we
need to solve here, or if there is I haven't seen the evidence yet for an
actual broker problem to fix because the messages should be copied on dispatch.
A few things to point out:
First, the goal here should not be to make changes to the code to simply
make the unit test pass. The unit test is an artificial recreation of an error
by manually calling that method outside of normal operation in a multi threaded
environment. As @tabish121 pointed out, those messages were **never** intended
to be used by multiple threads so there will be race conditions if 2 threads
are operating on the message. That alone makes the test invalid as the messages
are inherently not thread safe so a stress test will break it.
Second, simply making this change inside of clearUnmarshalledState() to
serialize and marshal the data if not serialized does not make the messages
thread safe. There is still a race condition and using that method is only safe
from a single threaded environment. You could still get into trouble if calling
the method from multiple threads so it doesn't really solve the issue if there
are 2 threads touching the message by mistake.
Third, yes the method is public, so in theory someone could certainly write
code that would invoke the method and clear the unmarshaled state without
marshaling the data first. However, if a user is going to write code to do
that a user is also capable of checking the data has been marshaled before
calling that method just like the broker does. If anything, it may make more
sense to do a state check and throw an exception inside of
clearUnmarshalledState() if the marshaled data is missing as if someone is
calling that method they likely are going to be expecting the data to already
be marshaled and transparently marshaling would just be hiding an error.
Lastly, this issue and PR were originally opened up because of receiving
null message bodies in a real environment but I have yet to see evidence or a
demonstration of how this is possible to happen because the messages should be
copied and be unique for each consumer. I'm not saying it's impossible to
happen but so far receiving null bodies has only been demonstrated through a
unit test that is not really a valid test because it creates a race condition
scenario that should not be possible in a normal broker operation because those
messages should always be copied.
Originally I was ok adding synchronization because I was thinking there was
a spot in the broker where multiple threads might be interacting with the same
copy in the VM transport. However, so far that doesn't appear to be the case or
at least we haven't found where that is yet as we should be copying the message
on dispatch. If there is a bug to fix that fix would be to locate the spot
where multiple threads are interacting with the same copy vs their own copy and
fix that. Any client code that is calling clearUnmarshalledState() should make
sure to do so on its own copy of the message and should verify it's safe to
call that method (ie check it's marshaled first)
Issue Time Tracking
-------------------
Worklog Id: (was: 1005962)
Time Spent: 7h 40m (was: 7.5h)
> Intermittent null/empty body when consuming from a topic (vm:// transport)
> --------------------------------------------------------------------------
>
> Key: AMQ-9855
> URL: https://issues.apache.org/jira/browse/AMQ-9855
> Project: ActiveMQ
> Issue Type: Bug
> Components: AMQP, Camel
> Affects Versions: 6.2.0, 6.1.2, 6.1.6, 6.1.7
> Reporter: JJ
> Priority: Major
> Fix For: 6.3.0
>
> Time Spent: 7h 40m
> Remaining Estimate: 0h
>
> Also see AMQ-6708 This is very much the same issue but with more details. The
> op on that ticket hasn't been seen since 2017.
> We have a simple AMQ instance using Camel; It connects to an upstream remote
> server via OpenWire and subscribes to topics. It Bridges those topics to the
> local AMQ with some later Camel processing.
> The route looks like this:
> <route id="Route_SPLITTER">
> <from uri="remoteServer:topic:TOPIC_A?durableSubscriptionName=some.user"/>
> <choice>
> <when>
> <simple>${body} == null || ${body} == ''</simple>
> <log message="Received message with missing body:
> ${header.CamelMessageHistory}"/>
> </when>
> <otherwise>
> </otherwise>
> </choice>
>
> <to uri="localAMQ:topic:MY_TOPIC_A"/>
> <split streaming="true" >
> <method ref="Splitter" method="processMessage"/>
> <multicast>
> <to uri="direct:routeSorter"/>
> </multicast>
> </split>
> </route>
>
> Logging was added to make sure it wasn't an upstream issue (and it's not)
>
> The data being passed is formatted as arrays of JSON. The <to
> uri="localAMQ:topic:MY_TOPIC_A"/> just passes it untouched. The Splitter send
> a copy elsewhere to be filtered by an order number prefix.
> The internal Camel to AMQ connection is via the vm:// transport using
> org.apache.camel.component.activemq.ActiveMQComponent (but I have also tried
> a pooled JMS connection factory with the same results)
> When I connect a test non durable consumer from a Ruby script using STOMP, or
> NIO I see the same issue. Some messages appear to have a 0 sized body.
> I can connect an c++ open wire consumer from the same server and that
> instance gets all messages with no 0 size bodies.
> I have tried various versions of Camel and all exhibit the same results.
> It;'s also worth noting that the data sent to the splitter function reports
> no errors either.
> I have also tried some of the older STOPM GEM packages but no change. (Though
> I have found some odd connection issue when you upgrade to io-wait-0.4.0 from
> 0.3.1
>
> After much swapping things round and testing I've finally narrowed it down to
> some issue with the vm:// transport...
> I have swapped the internal Camel connection from using vm:// to tcp:// and
> for the last 24hrs have seen no client errors with 0 sized bodies.
> I don't have any way to debug this deeper but hopefully someone else will
> pick this up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact