[ 
https://issues.apache.org/jira/browse/AMQ-9855?focusedWorklogId=1004874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1004874
 ]

ASF GitHub Bot logged work on AMQ-9855:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Feb/26 14:57
            Start Date: 12/Feb/26 14:57
    Worklog Time Spent: 10m 
      Work Description: cshannon commented on PR #1659:
URL: https://github.com/apache/activemq/pull/1659#issuecomment-3891444133

   The latest changes here still do not address this comment I made: 
https://github.com/apache/activemq/pull/1659#pullrequestreview-3786189384
   
   This is still marshaling which we should not be doing, with the VM transport 
we should just be using the message copy methods.
   
   BUT, the question is still what is the real issue and why is this happening? 
As pointed out, **we already copy the message on dispatch to consumers**. 
   
   The test you wrote does fail without your changes but ONLY because you are 
only comparing the String instances to see if they are the same, which they 
won't be as you are using OpenWire. However, with calling copy() the String 
instances are the same as Strings are immutable. If you modified your test to 
compare instance equality for the TextMessage instances and not just the bodies 
the test would pass because we copy on dispatch.
   
   My guess is the real issue could be a race condition during the copy or when 
the body is converted between states. The messages themselves are generally not 
thread safe which is why we copy on dispatch so each consumer gets their own 
copy, but some messages can toggle state..they can convert transparently 
between marshaled body and the in memory usable body (same with message 
properties). 
   
   @pradeep85841 - Are you only using Text messages?
   
   So for a text message, the message will either store the body as a String or 
it will store it as a buffer and can switch. My guess is things are breaking 
during that switch as multiple threads are probably calling some of these 
methods concurrently:
   
https://github.com/apache/activemq/blob/649aa0ae679ec104961408b4a1363ca88d0916bd/activemq-client/src/main/java/org/apache/activemq/command/ActiveMQTextMessage.java#L73-L181
   
   This might apply to other message types as well but this was the most 
noticeable type because it tries to only store either the String or byte buffer 
to save space.
   
   Going back a long time ago, I actually made some changes here to try and 
prevent issues without adding synchronization but it's not perfect: 
https://issues.apache.org/jira/browse/AMQ-5857
   
   So we just need to figure out the exact issue, if it's the copy() that is 
being done on dispatch (maybe multiple connections are calling copy at the same 
time on dispatch) we may just need to add some sort of synchronization there. 
But we need ot identify the real cause, a good first step might be to just test 
what happens if you apply the "synchronized" key word on the copy() method or 
also on the other methods that due mutations like getText(), setText(), 
storeContent() etc and re-run things to see if the issue is gone to help narrow 
it down to that.
   
   
   
   
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 1004874)
    Time Spent: 3.5h  (was: 3h 20m)

> Intermittent null/empty body when consuming from a topic (vm:// transport)
> --------------------------------------------------------------------------
>
>                 Key: AMQ-9855
>                 URL: https://issues.apache.org/jira/browse/AMQ-9855
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: AMQP, Camel
>    Affects Versions: 6.2.0, 6.1.2, 6.1.6, 6.1.7
>            Reporter: JJ
>            Priority: Major
>             Fix For: 6.3.0
>
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Also see AMQ-6708 This is very much the same issue but with more details. The 
> op on that ticket hasn't been seen since 2017.
> We have a simple AMQ instance using Camel; It connects to an upstream remote 
> server via OpenWire and subscribes to topics. It Bridges those topics to the 
> local AMQ with some later Camel processing.
> The route looks like this:
> <route id="Route_SPLITTER">
>     <from uri="remoteServer:topic:TOPIC_A?durableSubscriptionName=some.user"/>
>     <choice>
>         <when>
>             <simple>${body} == null || ${body} == ''</simple>
>             <log message="Received message with missing body: 
> ${header.CamelMessageHistory}"/>
>         </when>
>         <otherwise>
>         </otherwise>
>     </choice>
>         
>     <to uri="localAMQ:topic:MY_TOPIC_A"/>
>     <split streaming="true" >
>         <method ref="Splitter" method="processMessage"/>
>         <multicast>
>             <to uri="direct:routeSorter"/>
>         </multicast>    
>     </split>
> </route>
>  
> Logging was added to make sure it wasn't an upstream issue (and it's not)
>  
> The data being passed is formatted as arrays of JSON. The <to 
> uri="localAMQ:topic:MY_TOPIC_A"/> just passes it untouched. The Splitter send 
> a copy elsewhere to be filtered by an order number prefix.
> The internal Camel to AMQ connection is via the vm:// transport using 
> org.apache.camel.component.activemq.ActiveMQComponent (but I have also tried 
> a pooled JMS connection factory with the same results)
> When I connect a test non durable consumer from a Ruby script using STOMP, or 
> NIO I see the same issue. Some messages appear to have a 0 sized body.
> I can connect an c++ open wire consumer from the same server and that 
> instance gets all messages with no 0 size bodies.
> I have tried various versions of Camel and all exhibit the same results. 
> It;'s also worth noting that the data sent to the splitter function reports 
> no errors either.
> I have also tried some of the older STOPM GEM packages but no change. (Though 
> I have found some odd connection issue when you upgrade to io-wait-0.4.0 from 
> 0.3.1
>  
> After much swapping things round and testing I've finally narrowed it down to 
> some issue with the vm:// transport...
> I have swapped the internal Camel connection from using vm:// to tcp:// and 
> for the last 24hrs have seen no client errors with 0 sized bodies. 
> I don't have any way to debug this deeper but hopefully someone else will 
> pick this up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to